Content area
Full text
1. Introduction
Synthetic time series data is becoming increasingly important for cybersecurity applications as Al models grow more capable of processing large amounts of multimodal data (Agrawal, Kaur and Myneni, 2024). This is particularly relevant with recent Al models, which can enhance automation in threat detection, threat modeling, and initiating mitigation strategies (Sowmya and Mary Anita, 2023). For those use-cases synthetic data can aid in modeling rare special cases-a common weakness of Al models-and improve model generalization. It can also help address deficiencies in available datasets, such as imbalances and the scarcity of certain classes. Biometric data, in particular, is often scarcely available and subject to strict data protection regulations, which hampers the development of Al-based cybersecurity models (Rüb et al., 2022). As the computational complexity of Al models continues to increase, it becomes necessary to apply data reduction methods (Reddy et al., 2020). This raises the question of whether during synthetic data generation, sufficient information from time series data can be retained without the use of even more complex models.
This work investigates whether a simple synthetic generator with low computational complexity can capture enough intrinsic information about individuals from phase-averaged soft-biometrics to allow a classifier to identify the person correctly. The exemplary biometric case study involves gait analysis data collected with a smart insole featuring pressure-sensitive sensors. The dataset includes 30 participants, walking at a steady pace. Evaluations using various classification schemes show that a low-complexity architecture of a conditional variational autoencoder (cVAE) is indeed capable of generating sufficient information from phase-averaged time series data to...




