Content area

Abstract

Scorecards are widely used in decision-making due to their transparency, simplicity, and interpretability. A scorecard is a predictive model that assigns points to features based on their contribution to an outcome. The sum of these points produces a total score that can be used to make a classification or estimate the probability of a given event.

Traditional approaches to scorecard construction rely on a two-step process: first, continuous features are discretised and encoded; then, a predictive model calculates weights (points) for all features. Furthermore, current state-of-the-art methods focus solely on weight estimation, assuming a prior pre-processing step for feature discretisation. This leaves a gap in the development of algorithms that combine both steps into a unified process. To address this, this thesis introduces Infinitesimal Bins, a novel discretisation algorithm designed to approximate scorecard construction as a one-step algorithm.

Additionally, recent literature emphasises optimisation-based approaches over machine learningbased ones for scorecard design, particularly for binary scorecards, with RiskSLIM being the most notable method. However, its extension to ordinal classification remains underexplored. This work addresses this issue by adapting RiskSLIM for ordinal data through a Data Replication framework.

Special emphasis is placed on healthcare applications, where ordinal outcomes are more common and the interpretability and transparency of models are crucial for clinical usage. In particular, this thesis investigates the prediction of aesthetic results after breast cancer conservative treatment.

The experimental evaluation shows that Infinitesimal Bins, compared to the baseline discretiser, tends to produce larger models. Its granularity can be beneficial for small datasets, but often leads to excessively complex models in larger datasets, indicating the need for refinement through bin merging or pruning. In terms of encoding methods, Differential Coding consistently outperforms One-Hot Encoding by reducing overfitting, improving sparsity, and achieving higher accuracy. Among classifiers, in binary tasks, the sparsity-inducing Generic Generalised Linear Estimator (skglm) model achieved the best balance between compactness and predictive performance. For ordinal tasks, however, there is no single classifier that proves consistently superior; therefore, the choice of classifier should depend on the desired trade-off. Exploratory analyses further revealed that manual feature selection and ensemble strategies enhance interpretability and performance. Additionally, the extension of RiskSLIM to ordinal classification demonstrates the feasibility of optimisation-based approaches in this setting.

Details

1010268
Title
Optimised Interpretable Ordinal Scorecards
Number of pages
83
Publication year
2025
Degree date
2025
School code
5896
Source
MAI 87/6(E), Masters Abstracts International
ISBN
9798265499318
University/institution
Universidade do Porto (Portugal)
University location
Portugal
Degree
Master's
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32427189
ProQuest document ID
3288406806
Document URL
https://www.proquest.com/dissertations-theses/optimised-interpretable-ordinal-scorecards/docview/3288406806/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic