Content area
Trajectory representation learning transforms raw trajectory data (sequences of spatiotemporal points) into low-dimensional representation vectors to improve downstream tasks such as trajectory similarity computation, prediction, and classification. Existing models primarily adopt self-supervised learning frameworks, often employing models like Recurrent Neural Networks (RNNs) as encoders to capture local dependency in trajectory sequences. However, individual mobility within urban areas exhibits regular and periodic patterns, suggesting the need for a more comprehensive representation from both local and global perspectives. To address this, we propose TrajRL-TFF, a trajectory representation learning method based on time-domain and frequency-domain feature fusion. First, considering the heterogeneous distribution of trajectory data in space, a quadtree is employed for spatial partitioning and coding. Then, each trajectory is converted into a quadtree-code based time series (i.e., time-domain signal), with its corresponding frequency-domain signal derived via Discrete Fourier Transform (DFT). Finally, a trajectory encoder, combining an RNN-based time-domain encoder and a Transformer-based frequency domain encoder, is constructed to capture the trajectory’s local and global features, respectively, and trained by a self-supervised sequence encoding-decoding framework with trajectory perturbation-reconstruction task. Experiments demonstrate that TrajRL-TFF outperforms baselines in downstream tasks including trajectory querying and prediction, confirming that integrating time- and frequency-domain signals enables a more comprehensive representation of human mobility regularities and patterns, which provides valuable guidance for trajectory representation learning and trajectory modeling in future studies.
Introduction
Trajectory representation learning refers to the process of transforming raw trajectory data (composed of sequential trajectory points) into real-valued vectors through neural network models (Jiang et al., 2023). Compared to raw trajectories, using representations has significant advantages. For instance, raw trajectories often suffer from data redundancy, noise, missing values, non-uniform sampling rates, and variable lengths, typically requiring extensive data preprocessing and feature engineering. By contrast, well-designed representation learning models automatically transform raw trajectories into fixed-length vectors that preserve their underlying spatiotemporal characteristics. The resulting representations enable direct integration with machine/deep learning pipelines for diverse downstream applications such as: trajectory similarity computation, querying (Li et al., 2024), clustering (Yao et al., 2017; Fang et al., 2021), prediction (Lv et al., 2018), classification (Liang et al., 2021; Endo et al., 2016), and anomaly detection (Liu et al., 2020; Zhang et al., 2022).
Existing trajectory representation learning methods can be broadly categorized into supervised and self-supervised approaches. Supervised methods train trajectory encoders using pairwise similarity scores—typically measured by metrics such as edit distance—as ground-truth labels (Yao et al., 2022; Fang et al., 2022; Yang et al., 2024). Self-supervised methods, on the other hand, exploit intrinsic structure within the trajectory data itself as supervisory signals, and primarily follow two training paradigms: (a) Encoding-decoding frameworks, where the encoder is trained by reconstructing original trajectories from perturbed versions (e.g., downsampled, distorted, masked, or noise-injected) (Li et al., 2018; Lin et al., 2024; Zhu et al., 2024); (b) Contrastive learning frameworks, where positive (similar) and negative (dissimilar) trajectory pairs are fed into a dual-encoder model, and the encoder is trained by minimizing the distance between positive pairs while maximizing it for negative ones (Yan et al., 2023; Lin et al., 2023; Chang et al., 2023). Some studies have adopted a combination of the above two self-supervised learning strategies to train trajectory encoders—for example, Jiang et al. (2023).
Despite recent advances, most existing trajectory representation learning models are designed for and trained on vehicle trajectories (Fu and Lee, 2020; Chen et al., 2021; Fang et al., 2022; Jiang et al., 2023; Ma et al., 2024). However, individual stay-point trajectories (can be derived from mobile phone signaling data), which reflect residents’ daily activity locations, represent another critical category. As shown in Fig. 1, these trajectories differ significantly from vehicle movements in their spatiotemporal characteristics. For example, individuals tend to follow regular daily routines, exhibiting strong periodicity in their mobility patterns (Song et al., 2010). Moreover, most individuals can be grouped into a limited number of archetypes based on their mobility and activity behaviors, such as commuters, homebodies, night owls, and explorers (Ji et al., 2023; Cao et al., 2019; Jiang et al., 2012). Existing models are generally ill-suited for such trajectories, as they typically assume alignment with road networks via map matching and rely on external contextual factors—such as road topology, road types, and travel speed—to enhance performance. Although a few methods (e.g., t2Vec, TrajGAT, TrajCL) do not explicitly incorporate road network information and may appear adaptable to individual stay-point data, they are not specifically designed to capture its unique regularities (e.g., Li et al., 2018; Yao et al., 2022; Chang et al., 2023).
[See PDF for image]
Fig. 1
Demonstration of individual stay-point trajectory and vehicle trajectory
Therefore, we propose TrajRL-TFF, a Trajectory Representation Learning method tailored for individual stay-point trajectories by integrating local and global features through Time- and Frequency-domain feature Fusion. First, we partition the study area by a quadtree to account for the heterogeneous distribution of trajectory data. Secondly, each trajectory is converted into a quadtree-code-based time series as its time-domain signal, with the corresponding frequency-domain signal derived through Discrete Fourier Transform (DFT). Finally, a trajectory encoder, combining an RNN-based time-domain encoder and a Transformer-based frequency- domain encoder, is constructed to capture local and global trajectory features, respectively. The model is trained via a self-supervised sequence encoding-decoding (Seq2Seq) framework with trajectory reconstruction task. Once trained, the trajectory encoder can generate representations for any individual stay-point trajectory.
The innovations of this study are mainly reflected in the following aspects:
We adopt quadtree to partition the city and encode the resulting spatial units (i.e., regions). By accounting for the heterogeneous spatial distribution of trajectory points, this method assigns higher spatial resolution to point-dense areas and ensures that all regions contain sufficient data for training. Such adaptive partitioning improves the quality of trajectory representations by balancing spatial granularity and data availability.
We represent quadtree-code-based trajectories as time series, from which frequency-domain signals can be derived. Because spatially adjacent units share common code prefixes, the resulting time series exhibit smoother transitions with reduced large numerical jumps between consecutive trajectory points. This property suppresses high-frequency noise in the transformed signal and facilitates clearer identification of periodic patterns.
An RNN-based time-domain encoder and a Transformer-based frequency-domain encoder are constructed to capture local and global trajectory features, respectively. These two components are fused to form the overall trajectory encoder. This design effectively enhances the representation learning of individual stay-point trajectories characterized by regular daily routines and pronounced mobility patterns.
Experimental results demonstrate that our method consistently outperforms baseline models. Ablation studies further confirm the benefit of representing trajectories as quadtree-code-based time series and incorporating frequency-domain features to capture global mobility patterns. These results provide valuable insights for future research in trajectory representation learning.
Related work
This section provides a review of recent advancements in trajectory representation learning methods, and highlights the positioning of this study within the existing literature as well as the specific research gap it aims to address.
Supervised approach for trajectory representation learning
Supervised approach for trajectory representation learning aims to map raw trajectory data into low-dimensional vector spaces using supervisory signals (e.g., labels). The core objective is to leverage ground-truth labels (e.g., pairwise trajectory similarity) to guide models in capturing spatiotemporal dependencies, movement patterns, and semantic features within trajectories (Yao et al., 2019; Zhang et al., 2021). Following similar supervised learning paradigms, the main differences among related studies lie in the design of the encoder. For example, Yang et al. (2021) proposed T3S, which leverages a self-attention-based network to encode grid-represented trajectories and an LSTM network to encode coordinate-represented trajectories, effectively capturing structural and spatial information, respectively. Yao et al. (2022) proposed TrajGAT, which represents trajectories using a quadtree-based graph structure and employs a graph-attention-based Transformer (GAT) to encode them. Fang et al. (2022) proposed ST2Vec, which applies a temporal modeling module (TMM) and a spatial modeling module (SMM) to generate temporal and spatial representations of trajectories, respectively, and combines them using a spatio-temporal co-attention fusion (STCF) module. A simple but effective triplet-based pair-wise loss is also designed in this study. Yang et al. (2024) proposed SIMformer, which uses a single-layer vanilla transformer encoder as the feature extractor and employs pairwise trajectory distances as supervisory signals to constrain the vector space.
The above approaches utilize traditional trajectory similarity metrics (e.g., Hausdorff, DTW, Fréchet distance) to compute pairwise trajectory distances as supervisory signals, guiding models to learn suitable representation spaces for trajectory embeddings. However, these methods depend on large amounts of labeled data, which are computationally expensive to obtain. Moreover, employing different trajectory similarity metrics as supervisory signals can lead models to learn different types of trajectory representations. In practice, the optimal similarity metric often varies across application scenarios, which may limit the model’s generalization capability.
Self-supervised approach for trajectory representation learning
Self-supervised approaches primarily follow either encoding-decoding or contrastive learning as training frameworks. Encoding-decoding frameworks train trajectory encoders by reconstructing trajectories from perturbed inputs (Li et al., 2018; Fang et al., 2021). In contrast, contrastive learning frameworks optimize trajectory encoders using contrastive objectives that distinguish between similar and dissimilar trajectory pairs (Lin et al., 2023).
Encoding-decoding frameworks
In the training phase of the encoding-decoding frameworks, the encoder transforms a perturbed trajectory (e.g., a downsampled, distorted, or noise-injected version of the original) into a fixed-dimensional representation, while the decoder aims to reconstruct the original trajectory from this representation. Once trained, the encoder can be used to generate embeddings for any input trajectories. t2Vec (Li et al., 2018) represents the most classical approach, where an RNN-based encoder and an RNN-based decoder are applied. Many other studies have incorporated additional trajectory modeling factors (Ma et al., 2024; Li et al., 2023a). For example, Trembr (Fu and Lee, 2020) leverages auxiliary features—such as road types and traffic conditions—by embedding them alongside road segment IDs and integrating them into the encoder. This allows the model to learn more fine-grained representations that reflect both the topological and semantic properties of the underlying road network. Toast (Chen et al., 2021) represents each trajectory as a sequence of road segments. It introduces a traffic-context-aware skip-graph module, which incorporates an auxiliary traffic-related prediction task to learn informative road segment representations, and subsequently, route representations. These route representations are then fed into a customized bidirectional Transformer encoder, which is trained in a self-supervised manner through route recovery and trajectory discrimination tasks. TrajFM (Lin et al., 2024) constructs a vehicle trajectory encoder, STRFormer, which integrates the spatial and temporal features of trajectory points along with information about nearby POIs. The encoder is trained using sub-trajectory reconstruction and modality reconstruction tasks to enhance the model’s transferability across both regions and tasks.
Contrastive learning frameworks
In contrastive learning frameworks, researchers generate positive and negative trajectory pairs using various data augmentation techniques, and feed them into two encoders, respectively. The dual encoders, which share parameters, are optimized by minimizing the distance between positive pairs while maximizing it between negative pairs (Ma et al., 2024; Lin et al., 2023; Li et al., 2023b; Chang et al., 2023). For example, Yan et al. (2023) proposed a dual-view trajectory contrastive learning framework, in which three auxiliary pretraining tasks—trajectory imputation, destination prediction, and trajectory-user linking—are employed to support the training of a Transformer-based trajectory encoder alongside the contrastive learning objective. Chang et al. (2023) proposed TrajCL, where a dual-feature self-attention-based trajectory encoder is designed to jointly learn both the spatial and the structural patterns of trajectories. Li et al. (2023b) utilize trajectory augmentation to generate both low-distortion and high-fidelity views of trajectories and a contrastive loss is introduced to enhance representation consistency between the two views. Lin et al. (2023) proposed MMTEC, a multi-view pretraining approach that processes both discrete and continuous trajectories using an attention-based discrete encoder and a NeuralCDE-based continuous encoder respectively.
Summary
While existing approaches discussed above can produce high-quality trajectory representations with or without labeled data, most existing models remain tailored to vehicle trajectories constrained by road networks. These models often rely on external information such as road topology, road types, and travel speeds to enhance performance. However, beyond vehicle movement, another important type of trajectory, i.e., individual stay-point trajectories, exhibits fundamentally different characteristics. These trajectories demonstrate strong spatiotemporal regularities, as daily human activities often follow periodic patterns and can be clustered into a limited number of behavioral types. (Ji et al., 2023; Cao et al., 2019; Jiang et al., 2012). Most existing models that rely heavily on road network features are not applicable to this type of data. Although some methods (e.g., t2Vec, TrajGAT, TrajCL) do not incorporate explicit road network factors and may appear adaptable to individual stay-point trajectories, they are not specifically designed for such data and fail to account for its inherent regularities. Therefore, it is necessary to design a trajectory representation learning model specifically for individual stay-point trajectories by taking their key characteristics into account.
Preliminaries
Definition 1
Trajectory. Given trajectory of an individual , denote the -th stay point of the trajectory, where represents the corresponding two-dimensional geographic coordinates (latitude and longitude), and denotes the time slot ID with equal time intervals (e.g., 1 hour).
Definition 2
Trajectory Representation Learning. Given a trajectory sequence O of an individual, the goal of trajectory representation learning is to learn a trajectory encoder that maps the trajectory into a -dimensional real-valued vector , where is the dimensionality of the vector. This vector captures the spatiotemporal characteristics of the trajectory, enabling its use in downstream tasks such as trajectory querying, prediction, and classification.
Methodology
We propose a trajectory representation learning method based on time-frequency domain feature fusion (TrajRL-TFF). Figure 2 illustrated the framework of the approach, which consists of three main components: (1) spatial partitioning and coding based on quadtree, (2) extraction of quadtree-code-based time and frequency domain signals, and (3) construction of trajectory representation learning model with time-frequency domain feature fusion.
[See PDF for image]
Fig. 2
Framework of TrajRL-TFF for trajectory representation learning
Spatial partitioning and coding based on quadtree
A commonly used spatial partitioning method in trajectory modeling divides the study area into non-overlapping, equally sized grid cells. However, this uniform grid structure fails to capture the heterogeneous spatial distribution of trajectory data. Regions with dense trajectory points lack sufficient spatial granularity, while sparsely populated or empty regions consume unnecessary grid resources and receive insufficient training.
To address this issue, as illustrated in Fig. 3, we utilize a quadtree to partition the geographical space and code the divided regions, transforming the coordinate-based trajectories into quadtree-code-based trajectories. A quadtree is a tree-based hierarchical data structure used to recursively partition two-dimensional space into nested square regions. It is especially effective for representing spatial data with non-uniform distributions, enabling adaptive resolution according to local data density. The core idea of the quadtree is to divide a square region into four equal-sized quadrants: northwest (NW), northeast (NE), southwest (SW), and southeast (SE). Each quadrant can be further subdivided if it contains more than a certain threshold of data points, resulting in a recursive decomposition that forms a tree structure where each internal node has exactly four children.
[See PDF for image]
Fig. 3
Quadtree-based spatial partitioning and coding
The process of quadtree partition and coding is typically described as follows:
Initialization: Define a square bounding box that covers the entire study area as the root node of the quadtree. This node is considered level 0.
Recursive Partitioning: Each node (i.e., region) in the quadtree has an associated level, indicating its depth in the hierarchy. At each level, if the number of trajectory points within a node exceeds a predefined threshold, the node is subdivided into four equal quadrants—top-left (NW), top-right (NE), bottom-left (SW), and bottom-right (SE)—each becoming a child node at the next level (e.g., level 1, level 2, etc.). This process continues recursively: any child node that still exceeds the threshold is further subdivided, until either (1) the number of points falls below the threshold or (2) a specified maximum depth level is reached (e.g., level 4).
Coding: Each node is assigned a unique code that reflects its path in the quadtree hierarchy. At each subdivision, the four quadrants are indexed consistently—1: top-left (NW), 2: top-right (NE), 3: bottom-left (SW), 4: bottom-right (SE). A node's code is formed by concatenating the indices of the quadrants traversed from the root to that node. For instance, a region located in the NE quadrant at level 1, and then in the SW quadrant at level 2, will be assigned the code 23. To ensure consistent code lengths across all regions, shorter codes are padded with trailing zeros up to the maximum level. For example, if the maximum depth is 4, the code 23 will be padded to 2300.
Extraction of quadtree-code-based time- and frequency-domain signals
Following the approach described in Sect. 4.1, we convert each coordinate-based trajectory into quadtree-code-based trajectory , where denotes the quadtree code of the region corresponding to the i th time slot of individual k. This sequence forms a typical time series and can be regarded as the time-domain signal of the trajectory.
In order to capture the global characteristics of individual stay-point trajectories with pronounced daily activity patterns and regularities, the Discrete Fourier Transform (DFT) (Winograd, 1978) is employed to project the time-domain signal into the frequency domain. This frequency-domain representation enables the analysis of trajectory patterns from a global perspective. Unlike time-domain signals that capture local changes between consecutive time slots, frequency components summarize the overall periodic structure of the entire sequence. In particular, low-frequency components reflect long-term, repeating behaviors—such as daily routines—while high-frequency components capture short-term variations or noise. Therefore, analyzing the frequency spectrum allows us to effectively identify and quantify recurring activity patterns over the entire trajectory.
Let be a trajectory of length , the DFT computes the frequency spectrum as follows:
1
where j is the imaginary unit and represents the spectrum of T at the frequency . The spectrum consists of real parts and imaginary parts as:2
3
4
The amplitude part and phase part of are defined as:
5
6
Amplitude characterizes the contribution strength of each frequency component, reflecting the importance of periodic patterns within the trajectory. Phase describes the initial timing or temporal offset of each frequency component, indicating the relative timing of periodic occurrences. In this study, we focus on the daily activity patterns reflected in individual stay-point trajectories, rather than the temporal shifts of these patterns. Therefore, we use the amplitude to extract frequency-domain features. Moreover, using only amplitude reduces the input dimensionality, which improves training stability.
It is also worth noting that the hierarchical nature of the quadtree ensures that spatially adjacent regions often share common code prefixes, thereby preserving spatial continuity and resulting in smaller numerical differences between consecutive trajectory points. When such sequences are treated as time-domain signals, this property leads to smoother transitions and mitigates abrupt high-frequency fluctuations. As a result, the corresponding frequency-domain signals are less noisy and reveal clearer periodic structures, which in turn facilitates more accurate extraction of global activity patterns from individual stay-point trajectories.
Trajectory representation learning model based on time-frequency domain feature fusion
We construct a trajectory encoder based on time-frequency domain feature fusion, which is trained through an encoding-decoding framework with trajectory reconstruction tasks.
Encoder
The trajectory encoder is composed of an RNN-based time-domain encoder and a Transformer-based frequency-domain encoder to capture local and global trajectory features, respectively.
We design an upsampling and noise injection strategy to generate perturbed trajectories as input to the encoder. The decoder then reconstructs the original trajectories based on the trajectory embeddings produced by the encoder.
Step 1: Upsampling. A probability distribution is defined, where denotes the probability of inserting i new trajectory points between every two adjacent original points. For example, indicates a 40% chance of inserting no new points, a 30% chance of inserting one new point, and so on. The new points are generated using linear interpolation between the original trajectory points.
Step 2: Noise injection. We further apply Gaussian noise to the upsampled trajectory. A set of perturbation rates is defined, where each represents the standard deviation of the noise applied to the trajectory points. For each point , a perturbed point is generated by adding Gaussian noise sampled from to both and .
RNN-based time-domain encoder
The primary task of the RNN-based time-domain encoder is to learn the time-domain feature of the time-domain signal, which captures its local dependencies within the sequence of trajectory points. To mitigate the issues of vanishing or exploding gradients commonly encountered when RNNs process long sequences, we utilize GRU (Chung et al., 2014), a variant of RNN. Specifically, given the time-domain signal of an input trajectory , the encoder produces the time-domain feature (embedding) of the trajectory as follows: .
Transformer-based frequency-domain encoder
The primary task of the Transformer-based frequency-domain encoder is to extract informative features from the frequency-domain signal, . Leveraging the self-attention mechanism, the encoder captures the relative importance and interactions among different frequency components, thereby modeling the global structural patterns of the trajectory.
Each frequency component is first transformed into a high-dimensional embedding through two components: a frequency-aware embedding layer and a position encoding layer. Specifically:
7
where FrequencyEmbedding is implemented as a multilayer perceptron (MLP) that projects the scalar or vectorized frequency component into a latent space, and PositionEmbedding encodes the position index of to retain the sequential order of frequency components. This results in a sequence of embeddings:8
Feeding this embedding sequence into the Transformer encoder , we obtain the final frequency-domain representation of the trajectory:
9
This representation captures both low- and high-frequency components and their contextual dependencies, enabling the model to characterize the trajectory from a global frequency perspective.
Feature Fusion
The trajectory encoder integrates the time and frequency-domain encoders for generating trajectory representation vectors. Specifically, and are concatenated and passed through a linear projection layer to reduce the dimensionality back to , resulting in the time-frequency domain fused trajectory representation vector :
10
While more sophisticated fusion strategies—such as gated fusion or attention-based mechanisms—could explicitly model cross-domain interactions or dynamically suppress redundant features, we deliberately adopt this lightweight and generalizable approach. The linear projection layer serves as a trainable filter that learns to automatically balance, suppress, or enhance different components in the concatenated feature vector, thereby mitigating potential redundancy and reinforcing useful complementarities between the two domains. We empirically find that this simple fusion mechanism already achieves superior performance across multiple tasks (see Sect. 5). In the ablation study, we further compare the effectiveness of using a linear projection layer versus an attention-based mechanism for feature fusion (see Sect. 5.3).
Decoder
We adopt an RNN-based decoder , whose primary task is to reconstruct each original trajectory from the perturbed trajectory based on its embedding derived from the trajectory encoder. Specifically, the decoder is guided by to autoregressively predict all trajectory points in the original sequence , enabling trajectory reconstruction.
Training objective
The training objective is trajectory reconstruction through an encoding-decoding framework, aiming to recover the original trajectory from the perturbed trajectory.
Following an approach similar to that of Li et al. (2018), we employ a spatial-aware loss function in place of the commonly used Negative Log Likelihood (NLL) loss (Cho et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2015). Unlike NLL, the spatial-aware loss explicitly incorporates the spatial distance between predicted and ground-truth trajectory points when computing the loss. The penalty is proportional to this distance, the smaller the spatial discrepancy, the lower the resulting loss:
11
12
where represents the spatial proximity weight, which increases as region q becomes closer to the region containing the predicted trajectory point from the decoder. denotes the spatial distance between region centroids, and θ > 0 is the spatial distance scale parameter. A smaller θ imposes a stronger penalty on regions that are spatially distant. By optimizing the above loss function, the model is effectively “encouraged” to learn the spatiotemporal dependencies within the trajectory sequence, thereby enhancing its adaptability to various downstream tasks and improving its robustness to noise.Experiments
Study area and dataset
We select Shenzhen and Shanghai as the study areas, both of which are first-tier metropolitan cities in China, with areas of 1,997 km2 and 6,341 km2, respectively. The trajectory data are derived from mobile phone signaling records provided by China Unicom, one of the three major telecommunication operators in China. As shown in Table 1, each raw trajectory point is represented in the format [User ID, Latitude, Longitude, Timestamp].
Table 1. Format of raw trajectory
User ID | 42 | 42 | … | 42 | 42 | 42 | |
|---|---|---|---|---|---|---|---|
latitude | 22.729155 | 22.729155 | … | 22.738176 | 22.738176 | 22.729155 | |
longitude | 114.004822 | 114.004822 | 114.005019 | 114.005019 | 114.004822 | ||
timestamp | 2021.11.01 00:23:11 | 2021.11.01 01:45:33 | … | 2021.11.01 23:21:13 | 2021.11.02 00:41:22 | … | 2021.11.07 23:11:34 |
The Shenzhen dataset contains trajectories of 200,000 individuals over one week in November 2021. The Shanghai dataset contains trajectories of 50,000 individuals spanning the entire month of November 2023. To ensure consistency, the Shanghai data are divided into four one-week segments, producing 200,000 one-week trajectories in total (50,000 individuals×4 weeks).
After applying quadtree-based spatial partitioning to all trajectory points, Shenzhen is divided into 2,041 spatial units, with the smallest grid measuring approximately 826 m×508 m (Fig. 4), while Shanghai is divided into 6,097 spatial units, with the smallest grid measuring approximately 552 m×616 m (Fig. 5). In comparison, the 1 km grid partitioning method would yield 1,997 uniform grids for Shenzhen and 6,341 uniform grids for Shanghai. This demonstrates that the quadtree-based method better captures the spatial heterogeneity of trajectory distributions, achieving finer resolution in dense areas without substantially increasing the overall number of spatial units.
[See PDF for image]
Fig. 4
Quadtree-based spatial partitions of Shenzhen
[See PDF for image]
Fig. 5
Quadtree-based spatial partitions of Shanghai
Furthermore, all raw trajectory points are temporally aggregated into hourly intervals and spatially mapped to quadtree-based units. If multiple spatial units appear within the same hour of a trajectory, the unit with the longest dwell time is retained. In this way, a one-week trajectory is represented as a sequence of spatial unit IDs with a fixed length of 168 (Table 2).
Table 2. Format of preprocessed trajectory
User ID | 42 | 42 | … | 42 | 42 | 42 | |
|---|---|---|---|---|---|---|---|
Quadtree-code | 1331211 | 1331211 | … | 1333233 | 1333232 | … | 1243324 |
Time Slot | 0:00–1:00 | 1:00–2:00 | … | 23:00–24:00 | 0:00–1:00 | … | 23:00–24:00 |
Time ID | 1 | 2 | … | 23 | 24 | … | 168 |
Evaluation metrics and baselines
Evaluation metrics
We evaluate the performance of the proposed TrajRL-TFF using two downstream tasks: trajectory querying and trajectory prediction.
For the trajectory querying task, given a query trajectory, we calculate its similarity to all trajectories in the candidate set using cosine similarity between their embeddings. The rank of the most similar trajectory, based on the computed similarity scores, serves as the similar trajectory query rankings. In our evaluation, we randomly select trajectories from the test set as the database set, denoted , and another trajectories as the query set, denoted . For each trajectory in , we create two sub-trajectories, and , by taking its odd-indexed and even-indexed points, thereby forming two datasets and . We apply the same procedure to the trajectories in , yielding and . For each query sub-trajectory , we compute its cosine similarity against all trajectories in and , and record the rank of the similarity score between and its counterpart among all comparisons. Ideally, and should achieve rank 1, since they derive from the same original trajectory. The average of these ranks over the query trajectories serves as our performance metric.
For the trajectory prediction task, given the embedding of a trajectory, we employ an LSTM model to predict the trajectory’s locations at five future time intervals—specifically at the 5th, 12th, 24th, 36th, and 48th time steps. Prediction performance is evaluated using Mean Squared Error (MSE) and Mean Absolute Error (MAE).
Baselines
To validate the effectiveness of our proposed method, we compare it against nine baseline approaches: four metric-based methods, Longest Common Subsequence (LCSS), Edit Distance on Real sequence (EDR), Hausdorff, and Fréchet, as well as five representative learning-based methods, SIMformer, TrajGAT, t2Vec, TrajCL, and ST-GraphRL. The metric-based methods compute trajectory similarity by aligning trajectory points and aggregating pairwise spatial distances. In contrast, learning-based methods adopt different paradigms: SIMformer and TrajGAT learn trajectory embeddings through supervised learning, while t2Vec, TrajCL, and ST-GraphRL employ self-supervised learning frameworks.
LCSS (Vlachos et al., 2002) evaluates trajectory similarity by identifying the Longest Common Subsequence between two trajectories; a higher LCSS value indicates greater similarity.
EDR (Chen et al., 2005) measures similarity based on the number of edit operations, including insertions, deletions, and substitutions, required to transform one trajectory into another; a higher EDR value corresponds to lower similarity.
Hausdorff distance (Alt, 2009) captures the geometric similarity between two trajectories by computing the maximum of the minimum distances between points on each trajectory, making it sensitive to outliers.
Fréchet distance (Alt and Godau, 1995) accounts for the chronological order of trajectory points and intuitively measures similarity as the minimum leash length required for a person and a dog to walk along the two trajectories, respectively.
SIMformer (Yang et al., 2024) is a supervised learning model based on the Transformer architecture. It leverages similarity labels derived from metric-based methods such as Hausdorff and Fréchet distances and computes the loss based on the discrepancy between predicted and ground-truth similarities of trajectory pairs.
TrajGAT (Yao et al., 2022) is a supervised learning model that encodes trajectories using a graph attention mechanism instead of the self-attention in Transformer. It trains the model with metric-based methods as the supervisory signal.
t2Vec (Li et al., 2018) is a self-supervised model based on sequence encoding-decoding framework. It encodes 2D trajectory points into 1D IDs using uniform grid partitioning and incorporates a spatial-aware loss function to enhance the accuracy of the learned representations.
TrajCL (Chang et al., 2023) is a self-supervised model based on contrastive learning framework, which employs a dual-feature self-attention-based trajectory encoder. This encoder adaptively fuses structural feature-based attention and spatial feature-based attention.
ST-GraphRL (Huang et al., 2024) is a self-supervised model based on encoding-decoding framework. It proposes a joint encoding framework for individual trajectory representation that models spatial-temporal joint distributions and learns the intricate dependencies within trajectories.
Results and analysis
Visual analysis of trajectories’ time and frequency domain signals
To investigate whether frequency-domain features can capture the trajectories’ global characteristics (including non-periodic or noise-containing ones), we conducted clustering analysis on the frequency-domain signals of trajectory data using Shenzhen as a case study, which ultimately revealed four distinct types of individual trajectories. K-means and cross correlation function (CCF) are used as clustering algorithm and similarity measurement, respectively. Figure 6 illustrates the cluster centers of trajectories’ frequency-domain signals and the corresponding examples of trajectories’ time-domain signals (quadtree-code-based time series). Figure 7 further demonstrates the space-time paths of those examples.
[See PDF for image]
Fig. 6
Clustering results of trajectories' frequency-domain signals (a) Cluster centers of the frequency-domain signals (b) Examples of the correponding time-domain signals
[See PDF for image]
Fig.7
Examples of trajectories’ time-domain signals (represented in space-time paths) corresponding to the four clusters of frequency-domain signals (a) Trajectories corresponding to Cluster 1 (b) Trajectories corresponding to Cluster 2 (c) Trajectories corresponding to Cluster 3 (d) Trajectories corresponding to Cluster 4
Since each frequency component encapsulates information across the entire time domain, the spectral representation reveals the long-term trends of the corresponding trajectory time-domain signal (even for non-periodic or noise-affected trajectories). When the spectrum exhibits a limited number of sharp peaks, particularly at the fundamental frequency and its harmonics (integer multiples of the base frequency), it typically indicates strong periodicity in the time-domain signal. Crucially, even without prominent periodic peaks, the frequency-domain signal can still encode global variation patterns (e.g., abrupt changes, sporadic movements), as shown in our clustering results. This indicates that residents’ weekly mobility patterns can be categorized into four distinct types.
In the first cluster type, the frequency-domain signal displays no significant variation. Correspondingly, the time-domain signals show highly stable behavior throughout the week with few changes in positions, demonstrating a typical “home-stay” pattern in residents’ daily activities.
In the second cluster type, sharp peaks in the frequency-domain signal appear at the fundamental frequency and its harmonic positions. Correspondingly, the time-domain signals exhibit regular transitions among a few fixed locations, demonstrating a typical “commuter” pattern in residents’ daily activities.
In the third cluster type, the frequency-domain signal lacks prominent periodic peaks. Correspondingly, the time-domain signals behave either a single abrupt change or non-periodic transitions in positions, demonstrating a typical “infrequent” pattern in residents’ daily activities.
In the fourth cluster type, the frequency-domain signal contains a broader range of frequency components, especially with more peaks in the high-frequency range. These high-frequency components reflect the rapid positional changes observed in the time-domain signals, demonstrating a typical “sporadic-outing” pattern in residents’ daily activities.
The visualization analysis reveals that the frequency-domain signals can effectively capture both periodic routines and non-periodic/sporadic patterns of human mobility from a global perspective. Even for non-periodic or noise-containing trajectories, the spectral representation preserves global variation patterns—enabling differentiation into meaningful clusters. Integrating trajectories’ frequency-domain signals can help enhance the representation ability of the trajectory encoder.
Performance in trajectory querying task
We evaluated the performance of our method and six baseline approaches on the trajectory querying task using the metrics described in Sect. 5.2.1. Specifically, we randomly sampled 11,000 trajectories from the test set, selecting 1,000 as query trajectories and the remaining 10,000 as candidates for retrieval.
As shown in Table 3, metric-based methods exhibit suboptimal performance in trajectory similarity computation. For example, LCSS calculates similarity based on the length of the longest matching subsequence. However, since it primarily captures local matching patterns, it may produce misleading results when two trajectories share partial subsequences but differ substantially in their global shapes. EDR determines similarity by counting the minimum number of edit operations required for alignment. Yet, its sensitivity to small perturbations in trajectory points makes it vulnerable to minor noise or fluctuations, leading to instability in retrieval accuracy. Hausdorff distance suffers from extreme sensitivity to outliers; a few noisy points can substantially distort the similarity measure, which is problematic given the inevitable noise in real-world trajectory data. Fréchet distance, while designed to capture trajectory continuity, places undue emphasis on strict point-to-point matching and is highly influenced by local irregularities and point density variations, rendering it less effective in practice.
Table 3. Performance of different methods in trajectory querying task
Dataset | Class | Methods | Candidate Trajectory Dataset Size | ||||
|---|---|---|---|---|---|---|---|
2000 | 4000 | 6000 | 8000 | 10000 | |||
Shenzhen | Metric-based methods | LCSS | 1.500 | 1.967 | 2.463 | 2.985 | 3.598 |
EDR | 1.500 | 1.993 | 2.501 | 3.001 | 3.668 | ||
Hausdorff | 32.938 | 64.451 | 95.142 | 126.674 | 158.342 | ||
Fréchet | 37.016 | 72.133 | 106.214 | 141.037 | 176.160 | ||
learning-based methods | SIMformer | 40.280 | 80.617 | 120.463 | 160.928 | 201.103 | |
TrajGAT | 4.391 | 7.776 | 11.332 | 14.846 | 18.151 | ||
t2Vec | 1.655 | 2.280 | 2.875 | 3.494 | 4.174 | ||
TrajCL | 1.213 | 1.388 | 1.569 | 1.791 | 1.979 | ||
ST-GraphRL | 184.461 | 370.428 | 564.436 | 753.122 | 940.911 | ||
TrajRL-TFF | 1.110 | 1.211 | 1.329 | 1.410 | 1.571 | ||
Shanghai | Metric-based methods | LCSS | 1.304 | 1.597 | 1.876 | 2.286 | 2.454 |
EDR | 1.314 | 1.587 | 1.884 | 2.278 | 2.501 | ||
Hausdorff | 3.612 | 6.195 | 8.798 | 11.486 | 14.131 | ||
Fréchet | 7.773 | 14.339 | 21.005 | 27.775 | 34.483 | ||
learning-based methods | SIMformer | 15.671 | 30.016 | 44.244 | 58.916 | 73.573 | |
TrajGAT | 2.284 | 3.616 | 4.913 | 6.188 | 7.492 | ||
t2Vec | 1.232 | 1.509 | 1.748 | 2.002 | 2.297 | ||
TrajCL | 1.292 | 1.306 | 1.347 | 1.394 | 1.423 | ||
ST-GraphRL | 128.729 | 256.252 | 384.370 | 515.535 | 644.872 | ||
TrajRL-TFF | 1.071 | 1.142 | 1.205 | 1.265 | 1.331 | ||
Note: The best-performing values have been bolded
Note: The best-performing values have been bolded.Notably, except ST-GraphRL, self-supervised methods (t2Vec, TrajCL, and TrajRL-TFF) consistently outperform supervised methods (SIMformer and TrajGAT). A plausible reason is that supervised methods rely on ground-truth similarity labels computed from traditional distance-based metrics like Hausdorff and Fréchet, whose inherent limitations may transfer noise into the supervision signal, thereby impairing model training and generalization. Furthermore, as the dataset used in this study lacks the semantic annotations employed in ST-GraphRL, the representation space may become disorganized, which could in turn lead to suboptimal performance in the trajectory querying task.
Among the self-supervised methods, t2Vec and TrajCL achieve competitive results but still fall short in effectively capturing the periodic and global structural patterns inherent in individual stay-point trajectories. In contrast, our proposed method, TrajRL-TFF, achieves the best performance across all candidate set sizes. This confirms that incorporating global features via integrating frequency-domain features significantly enhances representation learning and retrieval accuracy.
Performance in trajectory prediction task
Following the evaluation metrics described in Sect. 5.2.1, we employ Mean Absolute Error (MAE) and Mean Squared Error (MSE) to assess trajectory prediction performance. Since metric-based methods (e.g., LCSS, EDR) only compute similarity between existing trajectories without predictive capability, we compare our method exclusively against four learning-based methods. As shown in Fig. 8, ST-GraphRL, TrajGAT and TrajCL exhibit relatively stable prediction errors across different time steps, while the other three methods—SIMformer, t2Vec, and TrajRL-TFF—show a clear increase in error as the prediction horizon extends. Notably, our proposed method, TrajRL-TFF, consistently achieves significantly lower error values across all prediction steps, demonstrating robust performance in both short-term and long-term trajectory prediction.
[See PDF for image]
Fig. 8
Performance of different trajectory representation methods in trajectory prediction task (a) Shenzhen (b) Shanghai
Ablation study
We conducted four ablation experiments on the Shenzhen dataset to evaluate the effectiveness of the key components in our model:
TrajRL-TFF-FreqDrop: Removing the frequency-domain encoder and using only the RNN-based time-domain encoder as the trajectory encoder.
TrajRL-TFF--QShuffle: Shuffling the quadtree codes of the partitioned regions.
TrajRL-TFF-Grid: Replacing the quadtree-based spatial partitioning and coding method with a 1km-grid-based approach.
TrajRL-TFF-Attn: Replacing the linear layer used for time-frequency feature fusion with an attention-based layer.
The results of these ablation experiments are summarized in Table 4.
Table 4. Results of ablation experiments
Models | Candidate trajectory dataset size | ||||
|---|---|---|---|---|---|
2000 | 4000 | 6000 | 8000 | 10000 | |
(1) TrajRL-TFF-FreqDrop | 1.359 | 1.702 | 1.998 | 2.364 | 2.692 |
(2) TrajRL-TFF--QShuffle | 132.372 | 262.022 | 388.847 | 517.501 | 646.672 |
(3) TrajRL-TFF-Grid | 1.531 | 2.110 | 2.682 | 3.283 | 3.860 |
(4) TrajRL-TFF-Attn | 29.132 | 57.120 | 85.274 | 113.567 | 141.753 |
TrajRL-TFF | 1.110 | 1.211 | 1.329 | 1.410 | 1.571 |
Note: The best-performing values have been bolded
Note: The best-performing values have been bolded.Ablation (1) shows that removing the frequency-domain encoder results in consistently inferior performance, confirming its critical role in capturing global patterns and enhancing trajectory representations.
Ablation (2) underscores the importance of quadtree codes in the initial trajectory representation. By ensuring that adjacent codes share common prefixes, the resulting time-series trajectories become smoother with fewer abrupt numerical jumps between points, thereby suppressing high-frequency noise in the transformed signal and enabling more effective identification of periodic movement patterns.
Ablation (3) demonstrates the advantages of quadtree-based spatial partitioning. Unlike uniform grid partitioning, the quadtree approach adapts to the heterogeneous spatial distribution of trajectory points, assigning finer resolution to dense regions while maintaining sufficient data in sparse areas. This adaptive partitioning improves the quality and robustness of learned representations.
Ablation (4) reveals that replacing the simple linear projection with an attention mechanism results in reduced performance. This suggests that, within our framework, the linear layer—despite its simplicity—is more effective for fusing time-domain and frequency-domain features, possibly due to its better generalization and reduced risk of overfitting.
Efficiency evaluation
We evaluate the computational efficiency of different methods in trajectory querying task based on Shenzhen dataset. As shown in Table 5, the horizontal axis represents the size of the candidate trajectory dataset, while the vertical axis denotes computational efficiency. Metric-based methods exhibit significantly lower efficiency in trajectory querying task compared to most learning-based methods. Specifically, both EDR and LCSS require point-by-point comparison of trajectory features to compute similarity, while Hausdorff and Fréchet necessitate spatial distance calculations for all points. These operations render metric-based methods inefficient for large-scale trajectory processing. In contrast, learning-based methods leverage pre-trained deep representation models to achieve substantially superior computational efficiency. Among them, t2Vec achieves the fastest performance owing to its pure Seq2Seq architecture. TrajRL-TFF, incorporating an additional transformer encoder, exhibits slightly lower efficiency compared to t2Vec. ST-GraphRL employs a decoupled and fused two-stage encoding process, resulting in lower efficiency compared to other baseline methods. In contrast, t2Vec achieves the fastest performance owing to its pure Seq2Seq architecture. TrajRL-TFF, incorporating an additional Transformer encoder, exhibits slightly lower efficiency than t2Vec.
Table 5. Efficiency of different methods (in second)
Class | Methods | Candidate trajectory dataset size | ||||
|---|---|---|---|---|---|---|
2000 | 4000 | 6000 | 8000 | 10000 | ||
Metric-based methods | LCSS | 116.83 | 234.11 | 345.62 | 456.63 | 585.13 |
EDR | 147.74 | 290.62 | 439.29 | 578.05 | 723.79 | |
Hausdorff | 47.36 | 103.75 | 159.40 | 220.41 | 278.00 | |
Fréchet | 102.99 | 199.62 | 297.34 | 405.38 | 545.40 | |
learning-based methods | SIMformer | 8.01 | 13.42 | 18.79 | 24.07 | 29.55 |
TrajGAT | 1.22 | 2.05 | 2.96 | 3.90 | 5.23 | |
t2Vec | 0.61 | 1.02 | 1.47 | 1.93 | 2.44 | |
TrajCL | 1.54 | 2.57 | 3.63 | 4.82 | 5.89 | |
ST-GraphRL | 256.12 | 428.30 | 599.37 | 770.52 | 942.77 | |
TrajRL-TFF | 1.42 | 2.37 | 3.32 | 4.28 | 5.25 | |
Parameter sensitivity analysis
We investigate the impact of model parameters on the performance of trajectory querying task using Shenzhen dataset with candidate trajectory dataset size of 10,000.
Effect of the dimension of embeddings
As illustrated in Fig. 9(a), when varying the dimension of the trajectory embeddings from 64 to 512, TrajRL-TFF achieved its best performance at 256. Lower-dimensional features may suffer from excessive compression and potential information loss, while higher dimensions generally offer greater representational capacity but require more training data to prevent overfitting. Consequently, we ultimately selected the 256-dimensional configuration.
[See PDF for image]
Fig. 9
Parameter sensitivity analysis results (a) Dimension of embeddings (b) The number of frequency-domian encoder layers
Effect of the number of the frequency-domain encoder layers
Figure 9(b) presents the model performance across different numbers of frequency-domain encoder layers. Performance improves as the number of layers increases, but begins to decline beyond two layers. This trend occurs because additional layers enable the model to capture more complex patterns, while excessive layers introduce overfitting. Therefore, a two-layer frequency-domain encoder is ultimately adopted.
Conclusions and discussion
This study proposes a trajectory representation learning method tailored to individual stay-point trajectories, namely TrajRL-TFF. First, a quadtree-based spatial partitioning approach is adopted to divide the study area and encode the resulting spatial units. By accounting for the heterogeneous spatial distribution of trajectory points, the method assigns finer spatial resolution to regions with higher point density, offering a more adaptive and rational alternative to conventional grid-based partitioning. Second, the quadtree-code-based trajectories are treated as time series, and a dual-encoder architecture is designed by integrating a time-domain encoder and a frequency-domain encoder to capture local and global mobility patterns, respectively. This approach significantly enhances the representation learning of individual stay-point trajectories characterized by strong periodicity and behavioral archetypes. Experimental results demonstrate that TrajRL-TFF consistently outperforms baseline models in downstream tasks such as trajectory querying and prediction, confirming that the integration of time- and frequency-domain features enables a more comprehensive representation of human mobility regularities. These findings provide valuable insights for future research in trajectory representation learning and modeling.
This study also has limitations. First, the proposed approach is specifically designed for individual stay-point trajectories with regular daily activity or mobility patterns. For trajectories lacking such characteristics (e.g., vehicle movement trajectories), the frequency-domain encoder may fail to capture meaningful representations. Second, while the quadtree-based spatial partitioning accounts for the heterogeneous distribution of trajectory data, improving spatial resolution in dense regions and reducing redundancy in sparse areas compared to uniform grid partitioning, the reliance on local spatial partitioning and coding inherently limits the models transferability to other study area (i.e., spatial generalization capability).
Looking forward, we posit that an effective trajectory representation learning model should exhibit stronger generality, including spatial and task generalization, robustness to various noise types, and applicability to diverse trajectory types (e.g., vehicle movement trajectories, individual stay-point trajectories, and check-in trajectories). The development of a trajectory foundation model that satisfies these requirements remains a critical avenue for future research.
Acknowledgements
Thanks to the anonymous reviewers for their helpful feedback.
Authors' contributions
Kang Liu conceived the study, wrote and revised the manuscript, and created the illustrative figures. Zhiying Lin implemented the experimental code and generated the result figures. Kemin Zhu, Ling Yin, Jianze Zheng, and Shaohua Wang contributed to manuscript revision. Yongheng Feng and Yan Zhang provided part of the data used in the experiments.
Funding
This work was supported by the National Natural Science Foundation of China [42571532; 42271474]; Guangdong Basic and Applied Basic Research Foundation [2024A1515012020]; Science, Technology and Innovation Commission of Shenzhen Municipality (KCXFZ20230731093902005; JCYJ20241202125008011); the National Earth Observation Data Center.
Data availability
The data and codes used in this research are available on figshare.com with the unigue identifier at the link: https://doi.org/10.6084/m9.figshare.28852241.v2. The data file comprises trajectories of 1000 individuals over a one-week period. Due to privacy concerns, more trajectory data is not available for public sharing. Additionally, we provide a Code Instruction document with step-by-step guidelines on how to reproduce our findings.
Declarations
Competing interests
The authors declare that they have no competing interests.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Alt, H. The computational geometry of comparing shapes. Efficient Algorithms; 2009; 5760, pp. 235-248. [DOI: https://dx.doi.org/10.1007/978-3-642-03456-5_16]
Alt, H; Godau, M. Computing the Fréchet distance between two polygonal curves. International Journal of Computational Geometry & Applications; 1995; 5,
Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.
Cao, J; Li, Q; Tu, W; Wang, F. Characterizing preferred motif choices and distance impacts. Plos one; 2019; 14,
Chang, Y., Qi, J., Liang, Y., & Tanin, E. (2023). Contrastive trajectory similarity learning with dual-feature attention. In Proceedings of the 39th International conference on data engineering (ICDE) (pp. 2933-2945). IEEE.
Chen, L., Özsu, M. T., & Oria, V. (2005). Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data (pp. 491-502).
Chen, Y., Li, X., Cong, G., Bao, Z., Long, C., Liu, Y., .. & Ellison, R. (2021). Robust road network representation learning: When traffic patterns meet traveling semantics. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (pp. 211-220).
Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP 2014).
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Endo, Y; Toda, H; Nishida, K; Ikedo, J. Classifying spatial trajectories using representation learning. International Journal of Data Science and Analytics; 2016; 2,
Fang, Z., Du, Y., Chen, L., Hu, Y., Gao, Y., & Chen, G. (2021, April). E 2 dtc: An end to end deep trajectory clustering framework via self-training. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (pp. 696-707). IEEE.
Fang, Z., Du, Y., Zhu, X., Hu, D., Chen, L., Gao, Y., & Jensen, C. S. (2022). Spatio-temporal trajectory similarity learning in road networks. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 347-356).
Fu, TY; Lee, WC. Trembr: Exploring road networks for trajectory representation learning. Acm Transactions On Intelligent Systems And Technology; 2020; 11,
Huang, F; Lv, J; Yue, Y. Jointly spatial-temporal representation learning for individual trajectories. Computers, Environment and Urban Systems; 2024; 112, [DOI: https://dx.doi.org/10.1016/j.compenvurbsys.2024.102144] 102144.
Ji, Y; Gao, S; Huynh, T; Scheele, C; Triveri, J; Kruse, J; Bennett, C; Wen, Y. Rethinking the regularity in mobility patterns of personal vehicle drivers: A multi-city comparison using a feature engineering approach. Transactions in GIS; 2023; 27,
Jiang, J., Pan, D., Ren, H., Jiang, X., Li, C., & Wang, J. (2023, April). Self-supervised trajectory representation learning with temporal regularities and travel semantics. In 2023 IEEE 39th international conference on data engineering (ICDE) (pp. 843-855). IEEE.
Jiang, S; Ferreira, J; Gonz´alez, MC. Clustering daily patterns of human activities in the city. Data Mining And Knowledge Discovery; 2012; 25, pp. 478-510. [DOI: https://dx.doi.org/10.1007/s10618-012-0264-z]
Li, J., Wang, M., Li, L., Xin, K., Hua, W., & Zhou, X. (2023a). Trajectory representation learning based on road network partition for similarity computation. In International Conference on Database Systems for Advanced Applications (pp. 396-413). Springer Nature Switzerland.
Li, L., Xue, H., Song, Y., & Salim, F. (2024, October). T-jepa: A joint-embedding predictive architecture for trajectory similarity computation. In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems (pp. 569-572).
Li, S., Chen, W., Yan, B., Li, Z., Zhu, S., & Yu, Y. (2023b). Self-supervised contrastive representation learning for large-scale trajectories. Future Generation Computer Systems,148, 357–366.
Li, X., Zhao, K., Cong, G., Jensen, C. S., & Wei, W. (2018, April). Deep representation learning for trajectory similarity computation. In 2018 IEEE 34th international conference on data engineering (ICDE) (pp. 617-628). IEEE.
Liang, Y., Ouyang, K., Yan, H., Wang, Y., Tong, Z., & Zimmermann, R. (2021). Modeling Trajectories with Neural Ordinary Differential Equations. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (pp.1498-1504).
Lin, Y; Wan, H; Guo, S; Hu, J; Jensen, CS; Lin, Y. Pre-training general trajectory embeddings with maximum multi-view entropy coding. IEEE Transactions on Knowledge and Data Engineering; 2023; 36,
Lin, Y., Wei, T., Zhou, Z., Wen, H., Hu, J., Guo, S., .. & Wan, H. (2024). TrajFM: A vehicle trajectory foundation model for region and task transferability. arXiv preprint arXiv:2408.15251.
Liu, Y., Zhao, K., Cong, G., & Bao, Z. (2020, April). Online anomalous trajectory detection with deep generative sequence modeling. In 2020 IEEE 36th International Conference on Data Engineering (ICDE) (pp. 949-960). IEEE.
Lv, J., Li, Q., Sun, Q., & Wang, X. (2018). T-CONV: A convolutional neural network for multi-scale taxi trajectory prediction. In 2018 IEEE international conference on big data and smart computing (bigcomp) (pp. 82-89). IEEE.
Ma, Z., Tu, Z., Chen, X., Zhang, Y., Xia, D., Zhou, G., .. & Gong, J. (2024). More than routing: Joint GPS and route modeling for refine trajectory representation learning. In Proceedings of the ACM Web Conference 2024 (pp. 3064-3075).
Song, C; Koren, T; Wang, P; Barabási, AL. Modelling the scaling properties of human mobility. Nature Physics; 2010; 6,
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (pp. 3104–3112).
Vlachos, M., Kollios, G., & Gunopulos, D. (2002, February). Discovering similar multidimensional trajectories. In Proceedings 18th international conference on data engineering (pp. 673-684). IEEE.
Winograd, S. On computing the discrete Fourier transform. Mathematics Of Computation; 1978; 32,
Yan, B; Zhao, G; Song, L; Yu, Y; Dong, J. Precln: Pretrained-based contrastive learning network for vehicle trajectory prediction. World Wide Web; 2023; 26,
Yang, C; Jiang, R; Xu, X; Xiao, C; Sezaki, K. SIMformer: Single-layer vanilla transformer can learn free-space trajectory similarity. Proceedings of the VLDB Endowment; 2024; 18,
Yang, P., Wang, H., Zhang, Y., Qin, L., Zhang, W., & Lin, X. (2021, April). T3s: Effective representation learning for trajectory similarity computation. In 2021 IEEE 37th international conference on data engineering (ICDE) (pp. 2183-2188). IEEE.
Yao, D., Cong, G., Zhang, C., & Bi, J. (2019, April). Computing trajectory similarity in linear time: A generic seed-guided neural metric learning approach. In 2019 IEEE 35th international conference on data engineering (ICDE) (pp. 1358-1369). IEEE.
Yao, D., Hu, H., Du, L., Cong, G., Han, S., & Bi, J. (2022, August). TrajGAT: A graph-based long-term dependency modeling approach for trajectory similarity computation. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 2275-2285).
Yao, D., Zhang, C., Zhu, Z., Huang, J., & Bi, J. (2017, May). Trajectory clustering via deep representation learning. In 2017 international joint conference on neural networks (IJCNN) (pp. 3880-3887). IEEE.
Zhang, C., Zhou, T., Wen, Q., & Sun, L. (2022). Tfad: A decomposition time series anomaly detection architecture with time-frequency analysis. In Proceedings of the 31st ACM international conference on information & knowledge management (pp. 2497-2507).
Zhang, H., Zhang, X., Jiang, Q., Zheng, B., Sun, Z., Sun, W., & Wang, C. (2021). Trajectory similarity learning with auxiliary supervision and optimal matching. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence (pp. 3209-3215).
Zhu, Y., Jianqiao Yu, J., Zhao, X., Wei, X., & Liang, Y. (2024). UniTraj: Universal human trajectory modeling from billion-scale worldwide traces. arXiv e-prints, arXiv-2411.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.