Content area
Time Series Clustering (TSC) is a well-known method of temporal clustering that results in dynamic cluster centers and static cluster labels. However, it is not suitable for identifying entities that do not clearly conform to a single temporal cluster definition. A popular existing method that has attempted to allow for label change is Temporal Label Analysis (TLA). Nevertheless, TLA results in static cluster centers and dynamic labels, making it not applicable to cases where the cluster definitions (centers) evolve overtime. As our first contribution, we showed that TSC and TLA are only subsets of a broader design space; and proposed a generalized Mixed Integer Linear Programming (MILP) framework which can reproducibly cluster temporal data according to any configuration in the design space with optimality guarantees. In addition, we built a Python package called tscluster which uses our MILP framework for temporal clustering spanning the design space. While TSC can be extended for predictive time series clustering tasks, little research has been done on applying predictive clustering to time series data. The baseline methods of predictive time series clustering do not account for causality, making it challenging for them to effectively identify predictive relationships between the time series features and the target feature. As our second contribution, we introduce the Granger Causal Tree (GCT) — a novel method for extending TSC to predictive time series clustering based on the “important” features identified to Granger cause the target feature thus, bridging the existing research gaps.
