Content area
Context: User stories play a crucial role in agile software development because of their structured format and ease of implementation. However, development teams face the challenging task of managing the variety of information required from multiple sources to craft user stories manually. Furthermore, poor-quality user stories can hinder communication among team members, potentially causing delays or leading to errors in the development process. Goal: This thesis investigates the state-of-the-art in the automatic generation of user stories and proposes various text generation models to assist in writing user stories within Agile Software Development (ASD) projects. We hypothesize that employing these models can help software practitioners write user stories more efficiently and with improved quality. Method: A range of research methods were used to construct and evaluate this thesis. Firstly, we conducted a Systematic Literature Review (SLR) to summarize the evidence on the topic. Based on the findings of our SLR, we introduced our two first text generation models for user stories (N-gram and GPT) and used a quantitative framework of metrics to compare them. Subsequently, we improved the N-gram model created and performed a controlled experiment followed by a survey designed to evaluate the use of text generation models for user stories. Results: The SLR found that there is a shortage of user stories corpora to support the implementation of text generation models for user stories as well as a wide variety of different Natural Language Processing (NLP) techniques and Machine Learning (ML) algorithms to specify user stories automatically. Only a few studies are concerned about the quality of the user stories generated by the approaches presented. Quantitative evaluation of the initial N-gram model using BLEU, ROUGE, and BERTScore metrics showed that while GPT models excelled in developing more comprehensive user stories, N-gram models demonstrated a higher degree of semantic sensitivity. Finally, our controlled experiment revealed that the upgraded version of the N-gram model enhanced the consistency and uniformity of the user stories compared to the manual writing method and brought important insights about user stories employment. Conclusion: The use of text generation models for supporting the writing of user stories is promising. These models accelerated the composition process and improved the quality of the resulting user stories. We encourage further investigations in the direction of refining the N-gram models technique or training other Large Language Models (LLMs) to support the writing of user stories in different contexts with varied templates.