Content area
Join queries lie at the heart of relational data systems. Traditionally, relational engines execute these queries by decomposing them into sequences of binary joins, guided by cost-based optimizers. While this paradigm has dominated for decades, it increasingly struggles to meet the demands of contemporary workloads—characterized by large-scale, skewed data, imprecise or missing statistics, and increasingly rich but intricate query semantics. These challenges expose both theoretical and practical limitations of the established approach, often resulting in brittle query optimizers and significantly suboptimal performance.
In parallel, the research community has made substantial progress in developing join algorithms with strong theoretical guarantees. However, these algorithms are often designed under idealized assumptions such as purely join-based queries or uniformly sized relations—divorced from the broader realities of real-world queries. In practice, relational queries often involve additional constructs like projections, antijoins, or group-by aggregation, operate over skewed data, or must respect a handful of integrity constraints. In other emerging scenarios—such as multi-query optimization, incremental query maintenance, or recursive, loop-containing queries—existing techniques frequently fall short. These settings further complicate the landscape and demand a more nuanced understanding of the interplay between query structure and data characteristics.
This dissertation seeks to bring these join algorithms closer to the practical needs. It presents a suite of novel join algorithms tailored to complex, real-world queries—offering principled performance guarantees without sacrificing generality. Along the way, it contributes new insights and design principles that inform how future query optimizers can be made more robust, adaptable, and interpretable. Collectively, these contributions pave the way toward a new generation of query optimization—firmly grounded in theory and effective in practice.