Content area
Python and its ecosystem have become integral to modern software development. Despite Python’s popularity, CPython, the reference implementation, has significant performance limitations compared with other widely used programming language implementations. In particular, while CPython supports threads and concurrency, it also uses a global interpreter lock (GIL) to synchronize the execution of Python code. As a result, developers both intentionally and unintentionally overlook subtle synchronization details when writing Python code. In 2023, CPython finally added a build option to compile a “free-threaded” variant of CPython without a GIL. The long-existing GIL minimizes the likelihood of unsynchronized code manifesting as bugs, but such races easily start appearing in a free-threaded build of CPython.
In this thesis, we present PyTsan, a dynamic data race detector designed for Python, capable of methodologically detecting hard-to-find data races. When running it on CPython 3.10’s standard library test suite, PyTsan reports 29 data races. Two of those data races were reported by others experimenting with CPython’s new free-threaded build, but had otherwise existed undetected for over 10 years. Furthermore, PyTsan shows that for one of the “fixed” bugs, on architectures other than x86/amd64, such as ARM or RISC-V, the merged resolution is insufficient.