Content area

Abstract

Computers play an indispensable role in our daily lives, providing a multitude of critical functions and services across areas such as communication, transportation, finance, and beyond. At the heart of modern computing lies a fundamental component: the software binary, a complex sequence of ones and zeroes that determines how specific tasks are executed on a computer. Ensuring the security and correctness of software binaries is of paramount importance, as vulnerabilities can have far-reaching consequences, potentially affecting financial systems and even human lives. However, analyzing software binaries at an industrial scale remains a significant challenge, primarily due to the difficulty of meeting three essential design requirements: rigor, non-intrusiveness, and scalability.

This dissertation proposes a binary-centric solution to enhance industrial-scale software binary analysis, addressing these three requirements by contributing to several fundamental binary analysis techniques: binary lifting, binary similarity analysis, and fuzzing.

First, we introduce a novel binary lifter, which translates software binaries directly into high-quality compiler-level intermediate representations (IRs) compatible with existing static analyzers, thereby enabling rigorous bug detection. Second, we propose a parallel binary lifting technique to address the scalability limitations of traditional lifters, allowing more efficient utilization of multi-core computers and scaling to extremely large binaries. Building on the IR code obtained through binary lifting, our third contribution is a binary similarity analysis technique that identifies third-party code within software binaries, enabling the reuse of existing knowledge and identification of zero-day vulnerabilities. Finally, extending beyond static analysis, our fourth contribution explores dynamic approaches by proposing a program-adaptive parallel fuzzer, which efficiently generates exploitable bugs with very low false-positive rates through runtime execution.

Together, these contributions constitute a systematic solution for software binary analysis, capable of seamless integration into modern software development lifecycles to detect and prevent defects at an early stage. Our approaches have demonstrated tangible real-world impact, being deployed in CI/CD pipelines at major organizations to perform daily software quality checks. Using these techniques, we have successfully identified hundreds of high-risk defects in both industrial software products and open-source projects. Furthermore, our advancements in fundamental binary analysis techniques open avenues for exploration and innovation in related areas of research.

Details

1010268
Title
Towards Industrial-Scale Software Binary Analysis
Number of pages
201
Publication year
2025
Degree date
2025
School code
1223
Source
DAI-B 87/6(E), Dissertation Abstracts International
ISBN
9798265488381
University/institution
Hong Kong University of Science and Technology (Hong Kong)
University location
Hong Kong
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32433165
ProQuest document ID
3288202058
Document URL
https://www.proquest.com/dissertations-theses/towards-industrial-scale-software-binary-analysis/docview/3288202058/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic