Content area
User profile data plays a central role in modern data-driven systems, particularly within Customer Data Platforms (CDPs), where it is leveraged to personalise user experiences and drive targeted marketing campaigns. However, these platforms typically offer only a snapshot of the most recent profile state, neglecting the temporal dynamics that can reveal valuable insights about user behaviour and profile evolution. This dissertation researches how to store user profile data over time, aiming to support historical querying and temporal analysis by enabling access to previous states and uncover trends.
The work begins by examining the conceptual foundations of time in databases, distinguishing between valid time (when data is true in the real world) and transaction time (when data is recorded in the system). Bitemporal data models, which integrate both dimensions, are explored as a means of capturing temporal complexity. Temporal databases, which preserve data changes over time and support time-aware querying, emerge as promising candidates for representing versioned user profiles. Yet, despite their potential, little research has assessed their suitability and performance in this specific context.
A systematic literature review is conducted to investigate how user profiles are modelled across domains. Representations range from ontology-based and concept-based models to graph-based approaches, highlighting the structural diversity and semantic richness of user data. The review identifies unique characteristics in user profile versioning, such as schema heterogeneity and incremental data evolution.
Additionally, a state-of-the-art review of temporal databases is carried out, focusing on systems that provide native support for temporal semantics, time-travel queries, immutable storage, and temporal indexing. From this review, three representative systems, PostgreSQL, XTDB v2, and TerminusDB, are selected for an empirical benchmark.
The main contribution of this dissertation is a benchmark evaluating how these systems store and retrieve versioned user profile data. Two storage strategies are compared: snapshot-based, where each complete version of a profile is stored independently, and operation-based, where only the differences between consecutive versions are retained. Using a realistic dataset derived from anonymised CDP data, comprising over 14 million updates across 713 thousand user profiles, we assess system performance across key dimensions, including storage efficiency, write latency, and retrieval time.
The results reveal trade-offs between storage strategies and across database systems, exposing the strengths and limitations of each approach. Ultimately, this work offers a comprehensive evaluation of temporal databases for user profile storage, providing practical insights for systems that seek to incorporate temporal reasoning into customer data management.