Content area
Formal software verification represents an essential approach for ensuring correctness and reliability of critical systems, yet faces significant barriers due to the complexity of manually specifying method contracts, loop invariants, and other verification constructs. This dissertation addresses this limitation through the development of an advanced Visual Studio Code extension that automates the generation of formal specification and verification constructs using multiple artificial intelligence providers.
The work extends the original implementation of a VSCode extension for Dafny, transforming an initial prototype into a robust and scalable platform for formal verification assistance. The solution implements a multi-provider architecture that integrates Artificial Intelligence services including OpenAI, Claude, and DeepSeek through a unified interface, with intelligent fallback mechanisms ensuring continuous availability even during individual service disruptions. Firebase integration enables dynamic configuration management and secure credential storage, making it suitable for both academic and industrial environments.
In an experiment with a dataset of 100 programs in Dafny of varying complexity, a multiprovider ensemble combining Claude Sonnet 4, DeepSeek-V3 and GPT-4.1 achieved the best performance, being able to generate correct specification and verification constructs (pre/postconditions, loop invariants, auxiliary predicates and functions) for 80% of the programs in 1 attempt and 85% of the programs in 3 attempts. Generated pre/post-conditions are checked by the Dafny verifier against a set of test assertions. Generated loop invariants are checked against methods pre/post-conditions.
Experimental validation through a controlled crossover design study involving 7 users and a total of 42 formal specification and verification tasks demonstrates the effectiveness of the proposed approach. Results reveal a System Usability Scale score of 74.6/100, indicating good user acceptance. Comparative analysis shows significantly superior success rates for AI-assisted approaches (85.7%) compared to manual methods (42.1%). Effort evaluation demonstrates 43.8% productivity gains, with average completion times of 6.25 minutes for tool-assisted approaches versus 11.12 minutes for manual methods.
This work further demonstrates the viability of multi-provider architectures for AI-assisted development tools, and establishes replicable experimental methodologies for evaluating formal verification tools. The results suggest that intelligent integration of multiple AI providers can significantly improve the accessibility and effectiveness of formal software verification.