Content area
Database Management System (DBMS) is a crucial system for storing, retrieving, and analyzing a large amount of data. As an important component in data-intensive applications, DBMS is extensively deployed to drive trillions of electronic devices and internet services. As a result, any vulnerability in the DBMS could potentially affect a large number of people.
Researchers have shed light on DBMS security research for a while. However, a nonnegligible number of DBMS bugs are still discovered today. There are a few challenges with detecting bugs in the DBMS. First, it is difficult to generate valid DBMS inputs because the Structured Query Language (SQL) used by DBMS is diverse and complex. Queries rejected by the DBMS sanity check fail to trigger interesting DBMS internal behaviors, making them less likely to trigger bugs. Second, without prior knowledge of the SQL features implemented by each DBMS, it is challenging to prioritize the testing resources towards generating feature-rich queries that are implemented by complicated DBMS back-end code. Moreover, generating simple SQL queries to test DBMS software is unlikely to trigger bugs and is inefficient in bug detection. Third, certain DBMS bugs, such as logic errors, are triggered silently. Unlike memory corruption bugs that typically cause noticeable and traceable application crashes, logic errors lead to incorrect DBMS outputs, which do not have an easy-to-capture pattern for bug detection.
To address the first challenge, this dissertation introduces a syntax-based SQL generation tool, ParserFuzz, which is specially designed for DBMS bug detection. The syntax-based SQL generation technique guides the testing tool in generating diverse and high-validity queries. Unlike previous tools that use pre-defined SQL templates or SQL unit tests as inputs, ParserFuzz learns all the possible SQL syntaxes from the DBMS’s built-in grammar and automatically creates queries containing all the SQL syntaxes that one DBMS supports. In addition, ParserFuzz employs a novel grammar rule categorization algorithm to handle the grammar path explosion problem caused by the recursive SQL grammar. Combined with code coverage feedback, ParserFuzz detects 81 new unique bugs from 5 popular DBMS systems.
To address the second challenge, this dissertation introduces SQLBull, a syntax-based SQL query generator that employs a new Bottom-up SQL generation technique. The new technique redirects more testing resources into exploring the feature-rich SQL grammar. Specifically, the exploration of SQL grammar begins with one interesting grammar rule that outlines the syntax of feature-rich SQL functionalities. The generator then backtracks (Bottom-up) the feature-rich SQL grammar rule to the root to create a syntax path that unveils this interesting grammar. Multiple Bottom-up generated syntax paths are then expanded and merged to create diverse and feature-rich SQL queries for testing. With the Bottom-up SQL generation technique, SQLBull achieves a more focused exploration of the feature-rich SQL grammar, making it more likely to expose DBMS bugs. Furthermore, SQLBull demonstrates that the Bottom-up grammar exploration is inherently capable of handling the recursive grammar. Lastly, SQLBull successfully found 63 unique zero-day bugs in 5 different DBMSs.
Finally, to address the third challenge, this dissertation proposes SQLRight, a general platform for detecting logic errors in DBMSs. SQLRight offers an easy-to-use interface for developers to define logic bug oracles, which verify the DBMS outputs with their expected results to expose potential logic errors. By combining code coverage feedback, a general oracle interface, and validity-oriented mutation techniques, SQLRight detects 18 logic errors from SQLite and MySQL.