Content area
This thesis investigates the challenges and opportunities presented by the increasing trend of using multiple specialized models, referred to as operational models, to address complex data analysis problems. While such an approach can enhance predictive performance for specific sub-problems, it often leads to fragmented knowledge and difficulties understanding overarching organizational phenomena. This research focuses on synthesizing the knowledge embedded within a collection of decision tree models chosen for their inherent interpretability and suitability for knowledge extraction. For example, a company with chain stores or a university with diverse programs, each using dedicated prediction models (sales or dropout, respectively). While these localized models are important, a global perspective is valuable organization-wide. However, managing many operational models, especially for cross-program/store analysis, can be overwhelming.
A methodology framed within a comprehensive framework is introduced to merge sets of operational models into consensus models. These consensus models are directed towards higherlevel decision-makers, enhancing the interpretability of knowledge generated by the operational models. The framework, named Inmplode, addresses common challenges in model merging and presents a highly customizable process. This process features a generic workflow and adaptable components, detailing alternative approaches for each subproblem encountered in the merging process.
The framework was applied to four public datasets from diverse business areas and a case study in education using data from the University of Porto. Different model merging approaches were explored in each case, illustrating various process instantiations. The model merging process revealed that the resulting consensus models are frequently incomplete, meaning they cannot cover the entire decision space, which can undermine their intended purpose. To address the issue of incompleteness, two novel methodologies are explored: one relies on the generation of synthetic datasets followed by decision tree training. At the same time, the other uses a specialized algorithm designed to construct a decision tree directly from aggregated (i.e., symbolic) data. The effectiveness of these methodologies in generating complete consensus models from incomplete rule sets is evaluated across the five datasets. Empirical results demonstrate the feasibility of overcoming the incompleteness issue, contributing to knowledge synthesis and decision tree modeling. However, tradeoffs were identified between completeness and interpretability, predictive performance, and the fidelity of consensus models.
Overall, this research addresses a critical gap in the literature by providing a comprehensive framework for synthesizing knowledge from multiple decision tree models, focusing on overcoming the challenge of incompleteness. The conclusions have implications for organizations seeking to use specialized models while maintaining a holistic understanding of the analyzed phenomenon.