Content area
This dissertation presents a novel information-theoretic framework, ROADMAP (Representation, Organization, and Analysis for Data Modeling and Annotative Predictions), for advancing information representation, knowledge organization, and predictive analytics in machine learning. Motivated by emerging limitations in classical entropy-based approaches, this research explores how new formulations in information theory can lead to improvements in accuracy, robustness, interpretability, and annotation efficiency in AI systems. Central to this investigation is DLITE (Discounted Least Information-Theoretic Entropy), a new entropy metric grounded in Least Information Theory (LIT), a mathematically rigorous formulation designed to quantify bounded entropy change, normalize scale effects, and satisfy essential information-metric properties - providing both theoretical clarity and practical utility serving as an alternative entropy-based quantification model.
DLITE was formulated by Ke (2020, 2022a) to address key theoretical and practical gaps in traditional information theory models such as Shannon Entropy and KL Divergence. Unlike popular information-theoretic measures, which lack metric distance properties, DLITE introduces a bounded and symmetric measure of the entropy difference, discounting redundant or scale-sensitive information using a formal entropy discount. By satisfying properties such as nonnegativity, symmetry, and identity of indiscernible, DLITE provides a scale-invariant, semantically meaningful framework that is theoretically rigorous and computationally robust.
Furthermore, central to this study is the conviction that a structured framework is essential to effectively address complex problems in AI. The ROADMAP framework, a three-tier model, ensures that each stage of information processing is logically aligned, systematically implemented, and analytically evaluated. This framework provides a scaffolded strategy for integrating theory with methodology, enabling deeper insights and reproducible experimentation. Moreover, this study operationalizes DLITE within a tiered ROADMAP framework. In Tier 1, DLITE informs feature weighting via TF-iDL, an entropy-discounted alternative to TF-IDF that reflects more meaningful semantic salience. In Tier 2, DLITE Impurity Measure (DIM) is integrated into decision tree models, resulting in more stable and informative node splits for hierarchical knowledge classification. In Tier 3, DLITE Loss is deployed as a loss function in transformer-based deep learning models, offering strong recall, fast convergence, and resilience to class imbalance, as evidenced in comparative experiments against Cross-Entropy and KL Divergence on CoNLL-2003, Basic NER, and the Broad Twitter Corpus.
Beyond empirical performance, DLITE supports selective annotation strategies and enhances explainability through interpretable attention and entropy landscapes. It offers a new way of thinking about how learning systems evaluate and optimize information, especially under conditions of noise, ambiguity, and limited data. This dissertation demonstrates that DLITE is not just a derivative of existing information theory but a substantive contribution to it: one that redefines how entropy is modeled for learning, ranking, and reasoning in machine intelligence.
But this research is also personal. It is not only a pursuit of technical advancement but a tribute to intellectual lineage. Like the storied advisor-advisee relationships that shaped entire disciplines—Frege and Wittgenstein, Thomson and Rutherford, Hilbert and Von Neumann—this dissertation emerges from a legacy of mentorship. The core theories at its heart, LIT and DLITE, were conceptualized by my advisor, Dr. Weimao Ke. My work, then, is not only a scholarly inquiry, but a continuation—a living expression of the ideas entrusted to me.
This dissertation exemplifies how new knowledge can emerge from lineage, shaped by the values of academic stewardship and inspired by the responsibility to carry forward a vision. In the tradition of advisor-advisee collaborations that have moved science and scholarship forward, this work aims not just to validate a theory, but to extend its reach. DLITE is more than a mathematical model; it is an invitation to rethink how we quantify, interpret, and apply information. In that spirit, this dissertation contributes to the evolving landscape of explainable, human-aligned AI, and to the scholarly lineage from which it was born.
