Content area
Modern software programs often make use of multiple programming languages. Each language has its own set of advantages and disadvantages. High-level languages like Java and Python allow rapid prototyping and fast development speeds without having to worry about low-level details such as memory management. Low-level systems programming languages like C allow for easier interfacing with hardware and can be used to write very high performance code. However, they require the programmer to manage memory carefully, lest they introduce critical memory safety issues.
Due to these differences and disparities in the semantics and security considerations of different languages, there is a likelihood of security issues being introduced when programmers context-switch between writing in multiple languages. For example, a programmer who is used to array accesses being bounds-checked by the language may end up introducing a spatial memory-safety issue with an out-of-bounds access. One who is used to garbage collection and is unfamiliar with the nuances of manual memory management may introduce a use-after-free vulnerability.
In this thesis, we start by taking a look at a broad survey of how these multi-language programs and foreign function interfaces are implemented. We dive into the details of how complexity can leak in these multi-language programs. Next, we apply existing vulnerability discovery techniques on Android to find these cross-language bugs at language layers between Java and C/C++ using the Java Native Interface (JNI).
Finally, this thesis presents a novel technique that eases the burden of implementing concolic testing for any programming language. Through the use of a simple debugging feature of line-by-line execution and a Large Language Model (LLM), we produce produce constraints in a verifiable manner. We show these concolic testing engines to be sound and use them for hybrid fuzzing of cross-language programs.