Content area
The Internet of Things (IoT) is an essential platform for industrial applications since it enables massive systems connecting many IoT devices for analytical data collection. This attribute is responsible for the exponential development in the amount of data created by IoT devices. IoT devices can generate voluminous amounts of data, which may place extraordinary demands on their limited resources, data transfer bandwidths, and cloud storage. Using lightweight IoT data compression techniques is a practical way to deal with these problems. This paper presents adaptable lightweight SZ lossy compression algorithm for IoT devices (SZ4IoT), a lightweight and adjusted version of the SZ lossy compression method. The SZ4IoT is a local (non-distributed) and interpolation-based compressor that can accommodate any sensor data type and can be implemented on microcontrollers with low resources. It operates on univariate and multivariate time series. It was implemented and tested on various devices, including the ESP32, Teensy 4.0, and RP2040, and evaluated on multiple datasets. The experiments of this paper focus on the compression ratio, compression and decompression time, normalized root mean square error (NRMSE), and energy consumption and prove the effectiveness of the proposed approach. The compression ratio outperforms LTC, WQT RLE, and K RLE by two, three, and two times, respectively. The proposed SZ4IoT decreased the consumed energy for the data size 40 KB by 31.4, 29.4, and 27.3% compared with K RLE, LTC, and WQT RLE, respectively. In addition, this paper investigates the impact of stationary versus non-stationary time series datasets on the compression ratio.
Introduction
The recent advances in sensing devices, network protocols, and communication technologies have led to the emergence of several applications like smart homes, smart healthcare, smart transportation, smart environmental monitoring, and so on [1]. These applications are based on smart devices (objects), ranging from sensor devices to home machines, and smartphones are everywhere around us. These devices construct the heterogeneous network called the Internet of Things (IoT) [2]. These smart objects of IoT have their processing unit, memory, power, and communication modules capable of processing, storing, and transmitting data wirelessly with limited capabilities. The increased demand for IoT devices due to their utilization in various applications leads to an increase in the complexity of the IoT network from the density and architecture point of view [3]. This vast number of IoT devices would introduce an immense amount of sensed data over the IoT network, raising the immense volume of data, energy consumption, bandwidth consumption, errors produced during transmission, high traffic, latency, and congestion over the IoT network [4, 5].
One important challenging issue in the IoT devices is the energy saving that impact the lifetime of the IoT network [6]. In the context of IoT, energy saving is not only essential from a power supply perspective but is also essential from a network perspective. Specifically, applications are constantly fed raw data from a variety of sensors. A key energy-intensive task in IoT devices is the transfer of data. High-frequency data transmissions can rapidly deplete the energy resources of the device [7].
To reduce the consumed power inside the IoT devices, it is necessary to reduce transmitted data by these devices over the IoT network [4]. The data reduction approaches [8, 9] and data compression techniques [4, 10, 11] are proved as promising techniques for energy conservation in the IoT devices.
Motivation
The exponential growth of IoT devices has led to an immense increase in the amount of data generated, placing extraordinary demands on limited resources, data transfer bandwidths, and cloud storage. To address these challenges, lightweight IoT data compression techniques have emerged as a practical solution. The motivation for this research stems from the need for a compression algorithm that can effectively handle various sensor data types, operate on resource-constrained IoT devices, and manage both univariate and multivariate time series.
Research questions
The research questions can be summarized as follows:
How can we design a lightweight lossy compression method precisely developed for IoT devices that effectively manages the compromise between compression ratio and data distortion?
What adaptations or adjustments to the current SZ compression algorithm are required to enhance its efficiency for application in IoT settings, specifically on devices with restricted computing and memory capabilities?
How efficient is the proposed SZ4IoT approach in addressing different types of sensor data and processing multivariate time series data in IoT networks?
In the context of IoT applications, can the SZ4IoT algorithm attain a superior compression ratio while preserving good levels of data distortion against current compression algorithms?
In terms of compression efficiency, computational overhead, and resource usage, how does the SZ4IoT implementation perform on different IoT devices, such as ESP32, Teensy 4.0, and RP2040?
Contribution
This paper aims to develop a lightweight lossy compression algorithm for data reduction based on the SZ compression algorithm. This lightweight algorithm provides a very high compression ratio and takes into account the trade-off between compression ratio and data distortion. This paper makes the following contributions:
An adaptable lightweight SZ lossy compression algorithm named SZ4IoT is proposed. The SZ4IoT designed specifically for IoT devices within a network. SZ4IoT can handle a variety of sensor data types (INT8, INT16, INT32, float, double) and is capable of processing multivariate time series. It is implementable on the majority of IoT devices, even those with limited resources. This lightweight algorithm offers a high compression ratio while carefully balancing the trade-off between compression ratio and data distortion.
The proposed SZ4IoT is implemented across various IoT devices, including the ESP32, Teensy 4.0, and RP2040. A library for SZ4IoT has been created and made available for download via GitHub [12], allowing further implementations and enhancements.
Extensive real-world experiments utilizing SZ4IoT are implemented in the C language, across various IoT devices (ESP32, Teensy 4.0, and RP2040). These experiments have provided insightful results in terms of compression ratio, compression and decompression times, data transmission, Normalized Root Mean Square Error (NRMSE), and energy consumption. Additionally, we examined the impact of stationary versus non-stationary time series datasets on the compression ratio, as well as the relationship between multivariate time series and compression ratio. The proposed SZ4IoT is compared with some existing lossy compression methods such as Lightweight Temporal Compression (LTC), Wavelet Quantize Threshold with RLE (WQT RLE), and K Run Length Encoding (K RLE), and the comparison results show that the proposed SZ4IoT outperforms other methods in terms of compression ratio, energy consumption, and compression time.
Organization
This paper is structured as follows: Related works are introduced in Sect. 2. Section 3 explains the IoT system. In Sect. 4, SZ4IoT is discussed in more detail. Section 5 presents the experimental setup, the IoT devices used for implementing and testing the proposed compression library, and the performance evaluation of SZ4IoT. Section 6 provides a comparison with other lossy compressors. Section 7 discusses the results, and Sect. 8 concludes with insights and future perspectives.
Related works
Lossy data compression for IoT
There are different lossy data compression techniques that are proposed in the literature for IoT applications to optimize data reduction, energy savings, and overall system efficiency. Recent research has presented various algorithmic approaches for lossy data compression, each with unique strengths and limitations.
Denoising autoencoders, for instance, have been implemented for the efficient compression of biometric signals in IoT devices, showing promising results in terms of compression ratio, reconstruction error, and computational complexity [13]. Liu et al. [14] studied the use of autoencoders for high-ratio lossy compression on real-world scientific data, concluding that the autoencoder outperforms other lossy compressors such as SZ and ZFP [15] in compression ratios. However, they also noted that a reduction ratio of more than two orders of magnitude is near-impossible without seriously distorting the data.
In addition, various lightweight data compression methods fall into the encoding techniques category, such as the Low Complexity Data Compression (LDC) technique, which employs Huffman Coding. It has been shown to be effective in conserving energy, resulting in an 8% decrease in energy use [16]. The general characteristics of Huffman coding, along with specific details outlined in [16], suggest that the LDC method is capable of handling binary data, which could potentially originate from either integers or floating-point numbers. It may not be as efficient if the signal fluctuations substantially alter the probabilities of the symbols, and while the compression ratio is not exceptionally high, it is deemed reasonably effective for the specified use case. The K-RLE algorithm, proposed by the authors in [17], presents a considerable improvement in compression ratio and energy usage compared to the traditional Run Length Encoding (RLE). The enhancement arises from modifying the RLE to utilize K-precision. Unlike RLE, which is a lossless compression algorithm, K-RLE is a lossy compression technique. Author in [18] proposes an adaptive compression technique for IoT data at the edge to optimize data transfer time and compression ratio. The technique dynamically selects the best compression algorithm based on resource constraints, achieving significant improvements in compression efficiency and performance.
In the category of signal processing and transform-based compression, Moon et al. [19] applied signal processing algorithms along with temporal difference encoding for the lossy compression of agricultural sensor data. This highlighted the predicament of striking a balance between reducing data volume and minimizing information loss. They analyzed algorithms such as Fast Walsh–Hadamard Transform (FWHT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Lossy Delta Encoding (LDE). The results revealed that the compression ratios achieved by FWHT and DCT surpassed those of other techniques. However, in the context of decompression quality, LDE exhibited superior performance compared to other methods. In a recent review of lossy compression techniques for IoT, Correa et al. [20] identified Fuzzy Transform (F-Transform), DCT, and DWT as some of the most prominent transform-based methods employed for IoT compression. According to the review, DCT is recognized for its capability to exploit temporal compression in IoT devices, while DWT is known for its use in spatial compression at cluster heads.
Various strategies focusing on linear approximation for continuous data collection have also been proposed, such as a tree-structured linear approximation scheme for wireless sensor networks (WSNs) [21]. The proposed method in [21] introduces an efficient procedure for identifying optimal piecewise solutions through linear regression. This process, mindful of the inherent heterogeneity among sensors, utilizes rate-distortion allocation. Remarkably, the method outperforms Compressed Sensing (CS)-based techniques in data reduction while maintaining similar distortion levels. Similarly, the application of Piecewise Linear Approximation (PLA) has been suggested as a means to reduce the transmission of raw data while maintaining an acceptable data reconstruction tolerance [22]. Pham et al. [23] proposed an alternative, less complex algorithm for piecewise linear approximation. The authors present an innovative algorithm to solve the Piecewise Linear Approximation with Minimum number of Line Segments (PLAMLiS) problem for a given time series and an error bound. By transforming the PLAMLiS problem into a set-covering issue, the method adopts a top-down approximation approach that maintains the accuracy of the previously proposed greedy PLAMLiS algorithm but significantly reduces its computational complexity, thus achieving energy savings and efficiency.
Another category of lossy compression that attracted attention recently is the lightweight temporal compression scheme (LTC). This approach was firstly proposed in [24] and discussed further in [25, 26]. LTC involves using single line segments to represent as many consecutive points as possible, transmitting information once a sample exceeds the error margin. This process is then repeated. LTC initializes with two points; the first is fixed, and the second point transforms into a vertical line segment defining all possible lines. As each new point is added, the set of all lines reduces until a point falls outside the possible lines, marking the end of one dataset and the beginning of a new one. For instance, Klus et al. [27] introduced Direct Lightweight Temporal Compression (DLTC), a simple and efficient method for sensor-based time-series data, showing improved performance over benchmark LTC-based methods (Figs. 1, 2).
[See PDF for image]
Fig. 1
A taxonomy of lossy compression techniques for IoT
SZ for IoT
In our previous work [11], we proposed an energy-efficient technique for data collection from IoT devices. This technique is grounded on the SZ compressor, originally proposed in [28]. The technique commences with the application of fast error-bound lossy compression to the collected data to reduce its size, followed by periodic transmission of the compressed data to the edge. Our compression strategy takes into account the trade-off between the compression ratio and the data quality. We then studied the impact of lossy compression on the reconstruction of data at the edge using a supervised machine learning technique.
In a subsequent study [29], we explored the influence of compression techniques, specifically, Compressed Sensing (CS), Discrete Wavelet Transform (DWT), and SZ, on the classification of time series via deep neural networks. Furthermore, in [30], we utilized SZ as a lightweight compressor to achieve a high compression ratio, disregarding data quality. In response, we proposed a deep learning approach using a temporal convolutional network architecture to enhance the reconstructed data at the sink.
However, our previous research had certain limitations. We employed SZ solely as a lossy compressor for a single type of data—namely, floating-point data—and did not consider different types of microcontrollers with varying memory and processing capacities in our experiments. For instance, in [11], we tested our approach on a wearable smartwatch. Despite being resource-constrained compared to a PC, it is not as limited as smaller microcontrollers.
The motivation for adopting SZ as a method for IoT data compression can be explained as follows: In contrast to autoencoder-based compressions [13, 14], SZ necessitates no training phase and exhibits adaptability to a wide range of IoT data. It should be noted that while the model proposed in [14] surpasses SZ in terms of compression ratio, the complexity of its implementation on constrained embedded devices presents a significant challenge. When compared with encoding-based techniques, the performance of SZ is independent of the probabilities and the order of symbol occurrences, as is the case in [16, 17]. Moreover, signal processing techniques, while valid, do not achieve a high compression ratio relative to SZ and tend to be less efficient for multivariate time series. Our preceding study [29] demonstrated that SZ, when coupled with transform-based compression methods, could significantly enhance the compression ratio. Linear interpolation techniques [21, 22] and lightweight temporal compression are effective, provided the data is smooth and exhibits temporal stability. In contrast, SZ integrates diverse interpolation strategies and delivers superior error-bounded compression for jagged and noisy time series.
IoT system
The IoT system comprises several components that cooperate to gather, send, manipulate, and analyze sensed data [31, 32]. Data compression techniques are essential for maximizing the efficiency of data transmission and storage in IoT devices [33, 34]. The essential elements of an IoT system comprise [5, 35, 36, 37, 38–39]:
IoT devices, including sensors and actuators, are the devices that directly facilitate interaction with the surrounding environment. Sensors acquire data such as temperature, humidity, and motion, while actuators execute actions according to the data, such as activating a light or modifying a thermostat. Lightweight compression techniques, such as the proposed SZ4IoT, delta encoding, or run-length encoding, are used to enhance the quality of the raw data received by sensors. This minimizes the volume of data sent to the edge devices or gateways at the network.
An edge gateway serves as an intermediary between IoT devices and the cloud. They collect data from several sensors, carry out preliminary processing (such as aggregation, filtering, and compression), and transmit the processed data to the cloud data center for additional analysis. Edge devices typically possess greater computational capacity compared to individual IoT devices, enabling them to effectively execute complex algorithms locally. The Edge gateways are positioned near the sensors; these devices are capable of local processing, analysis, and storage of data. Examples of these devices are IoT gateways, industrial controllers, smart routers, specialized edge servers, or Raspberry Pi devices. Edge devices consolidate data from several sensors and may use more computationally demanding compression methods, such as algorithmic learning. The reduction in the overall data volume required for transmission to the cloud results in bandwidth savings and lowered transmission expenses. Additionally, edge devices are responsible for decompressing received compressed data from IoT devices and thereafter devoting additional compression before sending the data to the cloud.
The fog gateway serves as an additional computing layer positioned between the edge and the cloud. They possess greater capabilities compared to edge devices, although they are situated in closer proximity to the network’s edge rather than in a centralized cloud data center. The fog gateway functions are distributed computing, intermediate processing, bandwidth optimization, and improved scalability. Examples of these devices are data center racks deployed at the network edge and specialized fog computing devices, or high-performance servers. Within an IoT system, compression algorithms are crucial in fog gateways as they reduce data volume, minimize latency, enhance energy efficiency, optimize storage, improve security, assist in data preparation, and enable scalability. The utilization of compression methods at the fog layer enables IoT systems to effectively manage large volumes of data, resulting in cost reduction and the provision of faster, more secure, and dependable services.
A cloud is a network of servers, either centralized or distributed, that offer substantial computing power, storage, and services remotely via the Internet. Within the framework of the IoT system, the cloud serves as the ultimate layer in the infrastructure, responsible for the majority of the complex tasks related to data processing and analytics. The choice of lossless versus lossy algorithms for cloud compression depends on the specific application objectives. For instance, material that requires long-term storage may be compressed using high-ratio compression techniques. Furthermore, the cloud undergoes data decompression to facilitate processing, analysis, and visualization.
SZ4IoT
Background
In the paper [28], the authors propose a fast, error-bounded lossy compression model known as SZ, designed specifically for High-Performance Computing (HPC) systems. This technique effectively reduces the size of scientific data, while simultaneously satisfying user fidelity requirements. SZ employs either relative or absolute error bounds to control compression errors, enabling users to set appropriate error-bound ratios, or thresholds, for lossy compression. The difference between the original and reconstructed data is bound by this threshold. SZ can handle both one-dimensional and multi-dimensional arrays of various data types. The compression algorithm first converts an N-dimensional array into a one-dimensional array to linearize the data. Then, compressing the linearized array using adaptive curve fitting models. The bestfit step utilizes three prediction models: Preceding Neighbor Fitting (PNF), Linear-Curve Fitting (LCF), and Quadratic-Curve Fitting (QCF). The difference among the three models resides in the number of precursor data points required to fit the original value. The adopted model is the one that produces the nearest approximation. Lastly, it compresses unpredictable data by studying its binary representation.
The SZ algorithm employs three main curve-fitting models for data prediction:
Preceding Neighbor Fitting (PNF) This model uses the previous value to estimate the current value.
Linear Curve Fitting (LCF) This model assumes that a linear line constructed from the previous two consecutive values can be used to predict the current value.
Quadratic Curve Fitting (QCF) This model posits that a quadratic curve constructed from the previous three consecutive values can accurately predict the current value.
Lightweight SZ for IoT devices
SZ is a high-performance error-bounded lossy compression technique, capable of handling vast amounts of data. Its diverse implementations cater to CPUs, GPUs, and FPGAs ,1 and can accommodate various data types and structures.
The original SZ code was designed for 64-bit processors. Given its size, adaptation is required to make it operable on Internet of Things (IoT) devices. To this end, we introduce the SZ4IoT library in this paper: a compact adaptation of the SZ lossy compressor tailored for IoT devices. The library accommodates different data types, including double, float, , , and . The adapted version of SZ is V 2.1.12.2.
Modifications to the original code involve adjusting variable and pointer types for compatibility with IoT device compilers, which use 32-bit pointers. Given the limited memory capacity of IoT devices, all nonessential parts were removed and buffer usage minimized. This optimization enables the code to operate on ESP32, RP2040, and Teensy4 chips. SZ integrates Gzip or Zlib for lossless compression of lossy compression outputs. We selected Zlib for this application to facilitate compatibility with Arduino devices, as the Arduino environment lacks a robust debugger, making code adaptation more challenging. Algorithm 1 illustrates the implementation of SZ4IoT on various IoT devices as demonstrated in this paper.
In Algorithm 1, the collected data is initially checked to determine its data type. Subsequently, based on the identified data type, SZ4IoT calls the corresponding compression and decompression functions. While the core working principle of the algorithm remains the same across all data types, it adjusts to use the parameters appropriate for the determined data type. The time complexity of the SZ4IoT algorithm is , and it also requires of memory, where denotes the number of data points per period.
In Algorithm 1, PNF, LCF, and QCF represent Preceding Neighbor Fitting, Linear Curve Fitting, and Quadratic Curve Fitting, respectively. Initially, based on the data type of the collected data, the appropriate compression and decompression functions will be assigned (see line 1). Next, the algorithm allocates 2 bits of memory to record the predictability of each value (see line 2). Where the key step for fast data encoding to select the best-fit curve-fitting model is to analyze the data sequence individually to determine if it can be accurately predicted within the specified error limits using the most suitable curve-fitting models, such as linear curve fitting and quadratic curve fitting. Every predicted value will be replaced with a two-bit numeric code that represents the corresponding best-fit curve-fitting model. Then, it calculates the size of the value range ’vrs’ (see line 3), which can be used to compare prediction errors with the error bound. The main steps of the algorithm (lines 4–30) involve examining each value in the 1-D array X individually. Specifically, the algorithm selects the best curve-fitting model to be stored in the ’BestSol’ parameter (lines 5–8) and then verifies whether its prediction error is within the user-specified relative error limits. If the prediction error value falls within these limits, ’BestSol’ will be stored in the bit-array (lines 10–22). Otherwise, the current data value will be compressed using IEEE 754 binary representation method.
Experimental setup, IoT platforms, and performance metrics
Experimental setup
In this section, we conduct several experiments using the Arduino IDE to evaluate SZ4IoT’s performance on different microcontrollers, such as ESP32, RP2040, and Teensy4.0. Various performance metrics, including compression ratio, compression time, decompression time, and energy consumption, are employed in this study.
IoT platforms
This section presents the IoT microcontrollers that are used in this study.
ESP32
The ESP32 microcontrollers, developed by Espressif Systems, are cost-effective and power-efficient devices equipped with built-in Wi-Fi and Bluetooth. They are suitable for developing IoT applications, replacing the previous ESP8266 microcontrollers. The ESP32 series is divided into three types: (i) ESP32-WROOM Series: Offers strong dual-core performance and integrated flash memory, ideal for Wi-Fi and Bluetooth-based wireless access applications. (ii) ESP32-WROVER Series: Comes with built-in flash memory and SPIRAM, providing high performance and memory for IoT and gateway applications. (iii) ESP32-MINI Series: Economical solution with built-in flash memory for simple Wi-Fi and Bluetooth connectivity applications. This paper primarily utilizes the ESP32-WROOM Series for implementing the proposed work.
Certain libraries and tools may lack the same level of support as larger, more widely used microcontrollers [40], see Table 1.
Teensy 4.0 IoT device
The Teensy 4.0 is a high-speed microcontroller featuring an ARM Cortex-M7 CPU with a 600 MHz NXP iMXRT1062 chip, making it highly suited for various IoT projects. Its Cortex-M7 processor includes a floating point unit supporting double (64-bit) and float (32-bit) data types. The hardware supports dynamic clock scaling, enabling accurate timing functions despite speed variations, a distinct advantage over traditional microcontrollers. It also includes a power-off function controllable via a pushbutton connected to the On/Off pin. A standout feature is its Tightly Coupled Memory, allowing single cycle memory access through two 64-bit wide buses, separate from the M7’s primary AXI bus connected to additional memory and peripherals. This facilitates faster memory access, crucial for the device’s performance. Among the key benefits of Teensy 4.0 are its excellent clock speed and excellent performance. Provides an extensive selection of peripherals suitable for different uses. The device is equipped with sufficient RAM to support memory-intensive programs. However, a limitation of Teensy 4.0 is its somewhat higher cost in comparison with certain other microcontrollers. Because of its exceptional performance, it maybe consumes a greater amount of power compared to microcontrollers of lower quality [41], see Table 1.
Raspberry Pi Pico (RP2040) IoT device
The flagship microcontroller chip created by Raspberry Pi in the UK, the RP2040, is used to construct the small, quick, and versatile Raspberry Pi Pico board. The dual-core Arm Cortex-M0+ processor in the RP2040 supports up to 2MB of off-chip flash and has 264KB of internal RAM. I2C, SPI, and—particularly—Programmable I/O are just a few of the many flexible I/O options available (PIO). These enable countless potential applications for this compact and cost-effective package. Like the Raspberry Pi computer, the Raspberry Pi Pico has GPIO pins, making it possible to control and interface with many IoT devices. It is a different device made to run physical computing projects, unlike the Raspberry Pi 4, a single-board computer with a full Linux operating system. It cannot function with any MIPI CSI-2 camera modules because it lacks adequate memory, processing speed, and CSI interface. The primary benefits of the Dual-core ARM Cortex-M0+ Processor are: Demonstrates excellent performance across a varied range of applications. Economical: A cost-effective microcontroller with robust functionalities is the RP2040. Programmable input/output (PIO) state machines. Limitations: Restricted RAM and Flash storage capacity. Compared to well-established microcontrollers, it may have a littler number of libraries and community support [42], see Table 1
Table 1. The primary features of IoT devices
Feature | ESP32-WROOM | Teensy 4.0 | RP2040 | |
|---|---|---|---|---|
CPU | Dual-core Xtensa LX6 | ARM Cortex-M7 processor | Dual-core Arm Cortex-M0+ | |
Clock speed | Up to 240 MHz | Up to 600 MHz | Up to 133 MHz | |
Memory | 520 KB SRAM, 4 MB flash | 1 MB SRAM,2 MB flash | 264 KB of on-chip SRAM, 2 MB flash | |
Wi-Fi | Wi-Fi 802.11 b/g/n | No Wi-Fi | No Wi-Fi | |
Bluetooth | Bluetooth 4.2 LE | No Bluetooth | No Bluetooth | |
[See PDF for image]
Fig. 2
The microcontrollers used in the experiments of this paper
Performance metrics
This section presents some critical performance measures for evaluating the proposed Lightweight SZ4IoT compression approach. They are as follows:
Compression Ratio (CR) This metric is defined as follows:
1
where refers to the original data’s size before compression, and refers to the size of the data after using the SZ4IoT compression approach.Computation Time for Compression and Decompression (T) This metric refers to the total time taken by the compression and decompression processes.
Size of Compressed Data: This is a measure of the data size after compression, represented in bytes.
Error Rate To assess the deviation of the reconstructed data from the original data, we use the normalized version of the Root Mean Square Error (NRMSE), a common distortion measure. The NRMSE for the compression method can be defined as:
2
Where the RMSE can be defined as:3
where x represents the original data, and n represents the number of elements.
Datasets
This section presents various datasets of different sizes and types used in the implementation of SZ4IoT. Here are the details: For the integer dataset, we used:
EEG This dataset is from the Bonn University and includes different records (Z, O, N, F, S). It contains data from 64 electrodes placed on subjects’ scalps and sampled at 256 Hz (3.9-msec epoch) for one second. The dataset includes EEG data from awake volunteers, awake volunteers with their eyes closed, epileptic patients during seizure-free periods, and seizure activities. For performance evaluation, only Record A was used [43].
Fetal PCG Database (fpcgdb) This dataset comprises fetal phonocardiographic (PCG) signals from several healthy pregnant women in their final trimester. The recordings were made using a portable phonocardiographic device during 20-minute sessions on average [44].
MIMIC Database Numerics (mimicdb/numerics) This section of the MIMIC Database consists of periodic measurements of physiological variables obtained from bedside ICU monitors. For performance evaluation, only the heart rate was used [45].
Lightning 2 This dataset comprises transient electromagnetic events associated with lightning detected by the FORTE satellite. It features spectrograms transformed into power density time series.
Star Light Curves This dataset includes time series data of the brightness of celestial objects over time. It consists of 1000 phase-aligned starlight curves, each of length 1024.
Car This dataset features 1-D time series data of four different types of car outlines extracted from traffic videos using motion information.
Freezer Small Train This dataset, part of the REFIT project, includes data from 20 households in the Loughborough area over 2013–2014. It comprises power demand data of kitchen freezers and garage freezers in House 1.
SonyAIBO Robot Surface2 Donated by Douglas Vail and Manuela Veloso of Carnegie Mellon University, this dataset includes accelerometer data for roll, pitch, and yaw from a Sony2 robot.
Plane This dataset contains shape data for several fighter airplanes, including the Eurofighter, Mirage, Harrier, F-22, F-14, and F-15. It was created using digital photos processed through a color image segmentation technique.
Table 2. Various datasets showcasing their sizes, types, number of periods, and size of each period
Dataset Index | Dataset Name | No. of elements | Data Type | No. of elements in each period | No. of periods | Dataset Index | Dataset Name | No. of elements | Data Type | No. of elements in each period | No. of periods |
|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Lightning2 | 38220 | FLOAT | 2940 | 13 | 25 | EEG(Z) | 35000 | INT32 | 5000 | 7 |
2 | Lightning2 | 38220 | FLOAT | 3822 | 10 | 26 | mimicdb/numerics/032n | 20200 | INT8 | 2525 | 8 |
3 | Lightning2 | 38232 | FLOAT | 4779 | 8 | 27 | mimicdb/numerics/032n | 57358 | INT8 | 8194 | 7 |
4 | StarLightCurves | 16500 | FLOAT | 1500 | 11 | 28 | mimicdb/numerics/032n | 20200 | INT16 | 2525 | 8 |
5 | StarLightCurves | 17500 | FLOAT | 2500 | 7 | 29 | mimicdb/numerics/032n | 35350 | INT16 | 5050 | 7 |
6 | StarLightCurves | 22800 | FLOAT | 3800 | 6 | 30 | mimicdb/numerics/032n | 56000 | INT16 | 8000 | 7 |
7 | Car | 19618 | FLOAT | 1154 | 17 | 31 | mimicdb/numerics/032n | 20200 | INT32 | 2525 | 8 |
8 | Car | 34620 | FLOAT | 2308 | 15 | 32 | mimicdb/numerics/032n | 35350 | INT32 | 5050 | 7 |
9 | Car | 34620 | FLOAT | 3462 | 10 | 33 | mimicdb/numerics/032n | 42000 | INT32 | 6000 | 7 |
10 | FreezerSmallTrain | 8428 | FLOAT | 602 | 14 | 34 | SonyAIBORobotSurface2 | 1755 | DOUBLE | 351 | 5 |
11 | FreezerSmallTrain | 8050 | FLOAT | 1150 | 7 | 35 | SonyAIBORobotSurface2 | 1755 | DOUBLE | 585 | 3 |
12 | FreezerSmallTrain | 8427 | FLOAT | 2809 | 3 | 36 | Car | 34620 | DOUBLE | 1154 | 30 |
13 | fpcgdb/fetal_PCG_p13_GW_36 | 10000 | INT16 | 1000 | 10 | 37 | Car | 34620 | DOUBLE | 2308 | 15 |
14 | fpcgdb/fetal_PCG_p13_GW_36 | 19980 | INT16 | 3996 | 5 | 38 | Car | 4620 | DOUBLE | 308 | 15 |
15 | fpcgdb/fetal_PCG_p13_GW_36 | 19980 | INT16 | 4995 | 4 | 39 | Car | 4620 | DOUBLE | 462 | 10 |
16 | fpcgdb/fetal_PCG_p13_GW_36 | 10000 | INT32 | 1000 | 10 | 40 | Car | 4620 | DOUBLE | 770 | 6 |
17 | fpcgdb/fetal_PCG_p13_GW_36 | 19980 | INT32 | 3996 | 5 | 41 | StarLightCurves | 7560 | DOUBLE | 504 | 15 |
18 | fpcgdb/fetal_PCG_p13_GW_36 | 19980 | INT32 | 4995 | 4 | 42 | StarLightCurves | 7560 | DOUBLE | 630 | 12 |
19 | EEG(Z) | 30848 | INT8 | 3856 | 8 | 43 | StarLightCurves | 7560 | DOUBLE | 756 | 10 |
20 | EEG(Z) | 57358 | INT8 | 8194 | 7 | 44 | FreezerSmallTrain | 8428 | DOUBLE | 301 | 28 |
21 | EEG(Z) | 60000 | INT8 | 10000 | 6 | 45 | FreezerSmallTrain | 8428 | DOUBLE | 602 | 14 |
22 | EEG(Z) | 30848 | INT16 | 3856 | 8 | 46 | FreezerSmallTrain | 8000 | DOUBLE | 800 | 10 |
23 | EEG(Z) | 57358 | INT16 | 8194 | 7 | 47 | FreezerSmallTrain | 8428 | DOUBLE | 1204 | 7 |
24 | EEG(Z) | 30848 | INT32 | 3856 | 8 |
Results, analysis, and discussions
In this section, we conduct several experiments using various performance metrics to illustrate the efficiency of implementing the proposed SZ4IoT on different IoT devices.
Selection of the relative bound ratio (rel Bound Ratio)
Parameter
The relative bound ratio (relBoundRatio) is a crucial parameter that influences the SZ4IoT compression ratio. Consequently, we performed multiple tests on the ESP32 using varying values for the relBoundRatio parameter. The objective was to identify an appropriate value that balances between key performance metrics, such as the compression ratio and NRMSE. Table 2 lists various datasets, including their sizes, types, number of periods, and size of each period. These datasets serve as the basis for experiments. The results, representing the average per period, are predicated on the datasets introduced in Table 2. Note that we use the term “period” to represent a batch or window of samples in time. In the context of IoT, a period is a specific window of time during which data are processed as a batch before transmission. Thus, each period in the datasets comprises a specific number of data samples that are handled collectively.
[See PDF for image]
Fig. 3
NRMSE for various relative bound ratios (rel Bound Ratio) across 47 datasets
[See PDF for image]
Fig. 4
Compression ratio for various relative bound ratios (relBoundRatio) across 47 datasets
Figures 3 and 4 illustrate the variations in the compression ratio and NRMSE metrics for different relative bound ratios (relBoundRatio) across 47 datasets. The relBoundRatio, specified in the configuration file sz.config, constrains errors during compression and decompression based on the global data range (). For example, if relBoundRatio is set to 0.01 for a dataset , where the range is 10, the error bound would be .
In the results depicted in Figs. 3 and 4, it is noticeable that an increase in the relative bound ratio (relBoundRatio) correspondingly increases both the compression ratio and the Normalized Root Mean Square Error (NRMSE). This type of error bound ensures that the compression error for any data point is proportional to the data range, making it suitable for datasets where the data values vary significantly. Based on experiments with 47 datasets, an optimal relBoundRatio of 0.1 balances maximizing compression ratio with maintaining acceptable NRMSE. For unseen IoT scenarios and real-world deployments, we propose guidelines for selecting the appropriate relBoundRatio based on the types of data generated:
Data Dynamics and Smoothness For IoT data with high smoothness and predictable patterns, such as those that can be well-approximated by curve-fitting models, a lower relBoundRatio can be used. This is because the data points are more likely to be accurately predicted, allowing for higher compression ratios without significant loss of accuracy.
Dimensionality and Scale IoT data can vary widely in dimensionality and scale. High-dimensional data or data with large value ranges may require a more carefully chosen relBoundRatio. For instance, if the data range size changes significantly over time, the compression errors should be limited by considering the value range to ensure that the evolved data is accurately represented.
Application-Specific Requirements Different IoT applications may have varying tolerance levels for compression errors. For example, environmental monitoring systems generating continuous streams of sensor data may prioritize higher compression ratios with acceptable error bounds to manage storage and transmission efficiently.
Error Sensitivity If the IoT application is highly sensitive to errors, such as in medical monitoring systems, a conservative approach with a lower relBoundRatio should be adopted to ensure that the compression errors remain within acceptable limits.
Compression ratio
In this experiment, we examined the influence of dataset size and type on the compression ratio. Figure 5 illustrates the compression ratio for various datasets, as detailed in Table 2, across different IoT devices including ESP32, Teensy 4.0, and RP2040. Each bar represents the average compression ratio obtained from SZ4IoT for different data types and sizes. The compression ratio is calculated according to Eq. 1. The data size per period depends on the data type, where the original data size is calculated as follows:
4
where the function sizeof(data type) returns the size of the input data type in bytes. The refers to the original data size. The compressed data size is in bytes.[See PDF for image]
Fig. 5
Compression ratio for different datasets executed on ESP32, Teensy 4.0, and RP2040. The vertical number on top of each bar represents the number of samples per period (window size)
The compression ratio for the SZ4IoT method varies between 4.76 and 77.01 across all datasets outlined in Table 2. When considering datasets of the same data type, as the data size of a given period expands, so does the compression ratio (refer to datasets numbers 13, 14, and 15 in Table 2, all of which are of the INT16 data type). This suggests that employing larger periods, which store more data in memory, is beneficial, provided the memory capacity of the microcontroller permits it. For datasets of differing data types, an increase in the number of bits in the data type correlates to an increased compression ratio for the same number of elements per period using SZ4IoT. This is supported by the example of datasets 13 and 16, which are of the INT16 and INT32 data types, respectively. (see Table 2). It suggests that for the same data and period/window size, SZ performs more effectively when data points are represented with a greater number of bytes. This observation aligns with the general principle that higher bit-depth data types offer more redundancy and, therefore, more opportunities for compression. Higher bit-depth data types inherently contain more information, which can include predictable patterns or redundancies that compression algorithms can exploit. For instance, lossless compression algorithms like Huffman coding or Lempel-Ziv-Welch (LZW) achieve better compression ratios when the data contains more predictable patterns or redundancies [47].
[See PDF for image]
Fig. 6
Transmitted data size for ESP 32, Teensy 4.0, and RP2040
Figure 6 illustrates the relationship between the size of the data within each period and the volume of transmitted data after applying the SZ4IoT compression technique. It shows that as the size of the data within each period increases, the volume of transmitted data also increases, but not in a linear manner due to the increasing compression ratio with larger batches. Upon applying the SZ4IoT compression technique, the amount of data transmitted by the IoT devices ranges between 0.119245 KB and 1.679688 KB. This reduction is significant for IoT applications, as it helps in conserving bandwidth and energy, thereby enhancing the overall efficiency and lifespan of IoT devices.
Compression and decompression time
This section investigates the compression and decompression times using the SZ4IoT approach for various datasets outlined in Table 2. Figures 7 and 8 depict the compression and decompression times delivered by SZ4IoT for different datasets across various microcontrollers.
[See PDF for image]
Fig. 7
Compression time for ESP32, Teensy 4.0, and RP2040 for various datasets
[See PDF for image]
Fig. 8
Decompression time for ESP32, Teensy 4.0, and RP2040 for various datasets
As indicated by the results in Figs. 7 and 8, the compression and decompression times of SZ4IoT on the Teensy 4.0 are shorter than those on the ESP32, which in turn are shorter than the times on the RP2040. Although the compression ratio remains consistent across the IoT devices, the compression and decompression times vary due to differences in the computational power of the microcontrollers. However, even with these differences, the processing times generally remain low and acceptable across all devices. This is particularly true when considering that a microcontroller is typically in a deep sleep state and only needs to wake up for processing, a process which does not exceed 0.1 s.
Multivariate time series
This study investigates the impact of compressing multivariate time series on the compression ratio. To achieve this experiment, we use the dataset of Bonn University that consists of five records (features), as explained in Sect. 5.4. Then, we convert it into several smaller datasets of 1, 2, 3, 4, and 5 features. The SZ4IoT is used to compress each of them periodically, and the average compression ratio is calculated for each dataset. Table 3 shows the relationship between the multivariate time series and the compression ratio.
Table 3. The multivariate time series vs the compression ratio
# | Dataset Name | No. of elements | No. of elements for each feature | No. of elements for each period | No. of periods | No. of features | Comp ratio |
|---|---|---|---|---|---|---|---|
1 | ZNFSO | 35000 | 7000 | 5000 | 7 | 5 | 50.706 |
2 | ZSFN | 35000 | 8750 | 5000 | 7 | 4 | 47.874 |
3 | ZSFO | 35000 | 8750 | 5000 | 7 | 4 | 49.081 |
4 | SFNO | 35000 | 8750 | 5000 | 7 | 4 | 43.817 |
5 | ZSF | 35000 | 11,667 | 5000 | 7 | 3 | 41.931 |
6 | ZSN | 35000 | 17500 | 5000 | 7 | 3 | 34.517 |
7 | ZSO | 35000 | 17500 | 5000 | 7 | 3 | 32.277 |
8 | SFN | 35000 | 17500 | 5000 | 7 | 3 | 40.587 |
9 | SFO | 35000 | 17500 | 5000 | 7 | 3 | 38.087 |
10 | FNO | 35000 | 17500 | 5000 | 7 | 3 | 35.786 |
11 | ZS | 35000 | 17500 | 5000 | 7 | 2 | 31.444 |
12 | ZF | 35000 | 17500 | 5000 | 7 | 2 | 31.444 |
13 | ZN | 35000 | 17500 | 5000 | 7 | 2 | 32.523 |
14 | ZO | 35000 | 17500 | 5000 | 7 | 2 | 27.556 |
15 | SF | 35000 | 17500 | 5000 | 7 | 2 | 40.313 |
16 | SN | 35000 | 17500 | 5000 | 7 | 2 | 35.169 |
17 | SO | 35000 | 17500 | 5000 | 7 | 2 | 31.076 |
18 | FN | 35000 | 17500 | 5000 | 7 | 2 | 36.051 |
19 | FO | 35000 | 17500 | 5000 | 7 | 2 | 33.589 |
20 | NO | 35000 | 17500 | 5000 | 7 | 2 | 31.066 |
21 | Z | 35000 | 35000 | 5000 | 7 | 1 | 26.526 |
22 | S | 35000 | 35000 | 5000 | 7 | 1 | 24.626 |
23 | N | 35000 | 35000 | 5000 | 7 | 1 | 35.640 |
24 | F | 35000 | 35000 | 5000 | 7 | 1 | 34.317 |
25 | O | 35000 | 35000 | 5000 | 7 | 1 | 25.593 |
The results presented in Table 3 reveal that SZ4IoT achieved a compression ratio that ranged from 24.626 to 35.640 for one-feature datasets, 27.556 to 40.313 for two-feature datasets, 32.277 to 41.931 for three-feature datasets, 43.817 to 49.081 for four-feature datasets, and up to 50.706 for five-feature datasets. In other words, the compression ratio provided by the SZ4IoT compressor tends to increase as the number of features in the compressed dataset grows. This observation was also made by the original authors of SZ [28], who explained that SZ works by converting multi-dimensional data into a one-dimensional array before compression. Following this linearization step, SZ achieves higher compression ratios, especially when applied to large datasets.
Based on these findings, we recommend compressing an entire multivariate time series at once, when possible, rather than dividing it into several univariate time series for separate compression.
Stationary versus non-stationary data
In this experiment, we examine the effect of both stationary and non-stationary time series datasets on the compression ratio achieved by our proposed SZ4IoT. A time series is considered stationary if the statistical properties of the process generating the series, such as the mean and variance, remain constant over time. Conversely, a non-stationary time series is one in which these statistical properties change over time. The Augmented Dickey–Fuller (ADF) test, the most common unit root test, is employed to determine whether a time series dataset is stationary or non-stationary. The null hypothesis posits that the series has a unit root, meaning it is non-stationary. If the ADF test statistic is less than the critical value and the , then it rejects the null hypothesis. This means that the time series does not have a unit root (i.e., it is stationary). If the ADF test statistic is higher than the given critical values and the , then it fails to reject the null hypothesis. This means that the time series is non-stationary. The ADF test is performed using the ADF function from the statsmodels library in Python. This section considers specific datasets to illustrate the impact of stationary and non-stationary datasets on the compression ratio when employing the proposed SZ4IoT. Different datasets, along with their sizes, types, ADF tests, and compression ratios, are displayed in Table 4.
Table 4. Different datasets with its sizes, types, ADF test, and compression ratio
Dataset | Data type | No. of elements | ADF test | P_value | Critical Values | Result | Comp. ratio | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
F | Int 32 | 1000 | 7.067 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Stationary | 19.32 | |
N1 | Int 32 | 1000 | 8.304 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Stationary | 18.18 | |
S | Int 32 | 1000 | 8.546 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Stationary | 15.15 | |
N5 | Int 32 | 1000 | 2.659 | 0.081 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Non-stationary | 13.41 |
O | Int 32 | 1000 | 2.307 | 0.170 | 1% | 3.438 | 5% | 2.865 | 10% | 2.568 | Non-stationary | 10.96 |
Z | Int 32 | 1000 | 1.863 | 0.340 | 1% | 3.437 | 5% | 2.865 | 10% | 2.568 | Non-stationary | 11.76 |
Car | Float | 1000 | 4.222 | 0.001 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Stationary | 24.539 |
Plane | Float | 1000 | 3.402 | 0.011 | 1% | 3.491 | 5% | 2.888 | 10% | 2.581 | Stationary | 36.697 |
StarLightCurves | Float | 1000 | 3.898 | 0.002 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Stationary | 22.099 |
Lightning2 | Float | 1000 | 2.201 | 0.206 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Non-stationary | 18.867 |
SonyAIBORobotSurface2 | Float | 1000 | 2.486 | 0.119 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Non-stationary | 9.569 |
FreezerSmallTrain | Float | 1000 | 1.489 | 0.539 | 1% | 3.437 | 5% | 2.864 | 10% | 2.568 | Non-stationary | 8.456 |
As evident from the results in Table 4, our proposed SZ4IoT achieves a higher compression ratio for stationary time series datasets across various data types. It delivers a compression ratio ranging from 15.15 to 36.697 for stationary time series datasets, whereas for non-stationary time series datasets, the compression ratio spans from 10.96 to 18.867 across diverse data types. Moreover, we noticed that an increase in data repetition within the time series enhances the compression ratio delivered by SZ4IoT, irrespective of whether the time series is stationary or non-stationary.
While the original work of Di [28] suggests that curve-fitting models work better when the data is predictable, and the empirical tests in this work show that SZ4IoT performs better on stationary data, we cannot make broad generalizations without a formal proof. This formal analysis and proof will be addressed in future work.
Energy consumption
In this experiment, we evaluate the energy consumption involved in transmitting various sizes of data between two ESP Wroom 32 devices, using the ESP-NOW protocol. The SZ4IoT method was implemented on an ESP Wroom 32 device, and energy consumption was measured using a USB tester, as shown in Fig. 9. Each scenario was run 3500 times in succession with various data sizes, and the cumulative energy consumption (in mWh) was recorded after the final execution. It is important to note that in this experiment, the microcontroller did not enter sleep mode between successive transmissions. Figure 10 presents the energy consumption for different data sizes on the ESP Wroom 32.
[See PDF for image]
Fig. 9
USB tester used to measure the energy consumption
This study considered three scenarios:
Compression of data using the lightweight SZ4IoT without transmitting it.
Compression of data using the lightweight SZ4IoT followed by transmission.
Transmission of the original data without prior compression.
[See PDF for image]
Fig. 10
Energy consumption
Comparison with lossy compressors
In this section, we evaluate the proposed lightweight lossy compression algorithm by comparing it with three other lossy methods: Lightweight Temporal Compression (LTC), Wavelet Quantize Threshold with RLE (WQT RLE), and K Run Length Encoding (K RLE). All methods are implemented on an ESP32 device. All datasets used for comparison had a size of 1154 data points. We conducted comparisons using the EEG, mimicdb/numerics/032n, Car, and FreezerSmallTrain datasets to assess the compression ratio and compression time. For energy consumption, the assessment was conducted using the StarLightCurves dataset.
Compression ratio comparison
From the results in Table 5, it is apparent that the proposed lightweight SZ4IoT data compression method increases the compression ratio to between 4.24 and 47.1 for all types of datasets. Meanwhile, the LTC, WQT RLE, and K RLE compression methods only increase the compression ratio to between 0.91 and 26.52, 1.3 and 15.18, and 1.64 and 21.98, respectively. Furthermore, the average compression ratios across all types of datasets are 21.75 for SZ4IoT, 11.81 for LTC, 6.43 for WQT RLE, and 8.66 for K RLE. The proposed lightweight SZ4IoT data compression method outperforms the other methods in terms of the compression ratio.
Compression time comparison
As seen in the results provided in Table 5, the proposed lightweight SZ4IoT data compression method consumes less compression time compared to other methods. The time it takes ranges from 6662.76 s to 8905 s for all types of datasets. In comparison, the LTC, WQT RLE, and K RLE compression methods require between 10960 s and 12776 s, 8682 s, and 14793 s, and 287 s and 1503 s, respectively. Furthermore, the average compression time across all types of datasets stands at 7938.2 s for SZ4IoT, 11682.9 s for LTC, 11877.7 s for WQT RLE, and 534.7 s for K RLE. While the proposed lightweight SZ4IoT data compression method consumes less compression time compared to methods like LTC and WQT RLE, it takes longer than K RLE, as shown in Table 5.
Table 5. Comparison of the proposed SZ4IoT approach with other techniques in terms of compression ratio and compression time
Dataset | Data type | Compression ratio | Compression time(s) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
SZ4IoT | LTC | WQT_RLE | K_RLE | SZ4IoT | LTC | WQT_RLE | K_RLE | |||
EEG | int8_t | 4.24 | 0.91 | 1.3 | 1.64 | 8353 | 11150 | 12245 | 301 | |
EEG | int16_t | 8.81 | 1.57 | 2.77 | 3.34 | 8719 | 11230 | 12230 | 292 | |
EEG | int32_t | 17.62 | 2.36 | 4.59 | 6.68 | 8905 | 11233 | 12259 | 296 | |
mimicdb/numerics/032n | int8_t | 6.01 | 7.44 | 1.75 | 2.53 | 7544 | 11505 | 8729 | 287 | |
mimicdb/numerics/032n | int16_t | 12.02 | 6.41 | 3.51 | 5.07 | 7852 | 10960 | 8696 | 291 | |
mimicdb/numerics/032n | int32_t | 24.04 | 9.61 | 7.03 | 10.14 | 7896 | 10967 | 8682 | 294 | |
Car | Float | 26.16 | 19.89 | 7.59 | 8.01 | 8063 | 12126 | 13174 | 335 | |
Car | Double | 44.2 | 26.52 | 15.18 | 10.68 | 6662 | 12776 | 14219 | 1503 | |
FreezerSmallTrain | Float | 27.31 | 18.61 | 6.86 | 16.48 | 8428 | 12122 | 13750 | 332 | |
FreezerSmallTrain | Double | 47.1 | 24.81 | 13.73 | 21.98 | 6959 | 12760 | 14793 | 1416 | |
Average | 21.75 | 11.81 | 6.43 | 8.65 | 7938.19 | 11682.9 | 11877.7 | 534.7 | ||
Energy consumption comparison
In this experiment, we compare the energy consumption of the proposed SZ4IoT with K RLE, LTC, and WQT RLE. As illustrated in Fig. 11, the proposed SZ4IoT consumed 11, 13, 18, and 24 mWh for 10, 20, 30, and 40 KB of data, respectively, over 3500 iterations during compression followed by transmission. In contrast, K RLE consumed 23, 25, 30, and 35 mWh for the same data sizes over the same number of iterations. LTC’s consumption was 22, 24, 29, and 34 mWh, while WQT RLE consumed 24, 26, 30, and 33 mWh for 10, 20, 30, and 40 KB of data, respectively, over 3500 iterations. Hence, SZ4IoT demonstrates superior performance compared to the other methods in terms of energy efficiency.
[See PDF for image]
Fig. 11
Comparison of energy consumption
Discussion
To the best of our knowledge, no existing lightweight data compressor for resource-constrained IoT microcontrollers can handle diverse data types, operate across various IoT devices, and manage both univariate and multivariate time series. In this work, we address these needs through SZ4IoT, a tailored adaptation of the SZ compressor. Below, we discuss how SZ4IoT provides answers to the research questions posed in this study.
Research Question 1 How can we design a lightweight lossy compression method for IoT devices that manages the balance between compression ratio and data distortion?
SZ4IoT was specifically developed to meet this balance by introducing a relative bound ratio (relBoundRatio) parameter. Extensive testing across 47 datasets identified an optimal relBoundRatio of 0.1, which effectively maximizes the compression ratio while keeping data distortion (measured by NRMSE) within acceptable limits. Thus, SZ4IoT provides an adaptable solution that allows IoT deployments to customize compression settings based on data fidelity requirements.
Research Question 2 What adaptations to the SZ algorithm enhance its efficiency for resource-constrained IoT settings?
The original SZ algorithm was modified to ensure compatibility with typical IoT device compilers and architectures, including adjustments to variable and pointer types for 32-bit systems and the removal of non-essential components. Additionally, buffer usage was minimized to meet the memory constraints of devices like the ESP32, RP2040, and Teensy4. Zlib was integrated for lossless compression of the lossy output, enhancing efficiency while maintaining Arduino IDE compatibility.
Research Question 3 How effective is SZ4IoT in handling different sensor data types and multivariate time series data?
SZ4IoT successfully compresses a variety of data types (INT8, INT16, INT32, float, double) and supports multivariate time series, with compression ratios ranging from 4.76 to 77.01 depending on dataset characteristics. In multivariate scenarios, SZ4IoT achieves higher compression ratios when compressing the entire flattened dataset rather than compressing each feature individually, indicating its effectiveness in multi-feature IoT applications.
Research Question 4 Does SZ4IoT achieve a superior compression ratio while preserving data fidelity compared to other algorithms?
When compared to alternative methods like LTC, WQT RLE, and K RLE, SZ4IoT consistently outperformed them in compression ratio, averaging 21.75 across various datasets, versus 11.81 for LTC, 6.43 for WQT RLE, and 8.66 for K RLE. Additionally, SZ4IoT reduced energy consumption by up to 31.4% compared with the closest competing method, further underscoring its efficiency.
Research Question 5How does SZ4IoT perform in terms of computational overhead and resource usage across different IoT devices?
Performance testing on ESP32, Teensy 4.0, and RP2040 devices confirmed that SZ4IoT’s compression and decompression times remained low across all devices, with the Teensy 4.0 showing the fastest times and the RP2040 the slowest, though all devices completed processing within 0.1 s. Additionally, SZ4IoT achieved a significant reduction in energy consumption, decreasing it by 57.7% to 64.7% compared to transmitting uncompressed data on the ESP32.
Key Recommendations and Limitations The SZ4IoT performs optimally when larger data windows are stored in memory prior to compression, maximizing the compression ratio. However, this approach requires careful consideration of memory constraints. Additionally, controlling data distortion remains challenging, particularly in critical applications like medical monitoring, where even slight distortions may impact decision-making. Fine-tuning parameters such as error bounds and quantization steps are essential for adapting SZ4IoT to specific application requirements but may not be universally optimal.
In summary, SZ4IoT provides a lightweight, versatile compression tool well-suited to IoT applications, answering the research questions by addressing the challenges of efficiency, adaptability, and data fidelity in resource-constrained environments. These findings offer valuable insights for researchers and developers in IoT data management, promoting energy-efficient and effective data compression strategies.
Conclusions and perspectives
This study details the successful application of SZ4IoT across diverse microcontrollers, demonstrating its flexibility with different data types and efficacy in a resource-limited environment. The conducted experiments revealed promising results in areas such as compression ratio, processing times, data transmission, and energy conservation. We further examined the impact of stationary vs non-stationary time series datasets on the compression ratio and explored the performance of the proposed compressor on multivariate time series. The future work aims to improve the quality of decompressed data via deep neural networks and adapt SZ4IoT for image compression on ESP32-CAM. Moreover, to validate the algorithm’s performance, it would be advantageous to conduct further empirical investigations and assessments containing a broader range of IoT devices and applications.
Acknowledgements
This work has been supported by the EIPHI Graduate school (contract "ANR-17-EURE-0002").
Funding
Not applicable.
Data Availability
Not applicable.
Declarations
Conflict of interest
The authors declare no competing interests.
Ethical approval
Not applicable.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used ChatGPT in order to improve readability and language. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
https://szcompressor.org/
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Biljana, L; Stojkoska, R; Trivodaliev, KV. A review of internet of things for smart home: challenges and solutions. J Clean Product; 2017; 140, pp. 1454-1464. [DOI: https://dx.doi.org/10.1016/j.jclepro.2016.10.006]
2. Stojkoska BR, Nikolovski Z (2017) Data compression for energy efficient iot solutions. In: 2017 25th telecommunication forum (TELFOR), pp 1–4. IEEE
3. Pattnaik, SK; Samal, SR; Bandopadhaya, S; Swain, K; Choudhury, S; Das, JK; Mihovska, A; Poulkov, V. Future wireless communication technology towards 6g iot: An application-based analysis of iot in real-time location monitoring of employees inside underground mines by using ble. Sensors; 2022; 22,
4. Idrees, SK; Idrees, AK. New fog computing enabled lossless eeg data compression scheme in iot networks. J Ambient Intell Humaniz Comput; 2022; 13,
5. Idrees, AK; Idrees, SK; Couturier, R; Ali-Yahiya, T. An edge-fog computing-enabled lossless eeg data compression with epileptic seizure detection in iomt networks. IEEE Internet Things J; 2022; 9,
6. Alhussein Duaa A, Idrees Ali K, Harb H (2021) Energy-saving adaptive sampling mechanism for patient health monitoring based iot networks. In: New Trends in Information and Communications Technology Applications: 5th International Conference, NTICT 2021, Baghdad, Iraq, Nov 17–18, 2021, Proceedings 5, pp 163–175. Springer
7. Idrees Ali K, Ali-Yahiya T, Idrees Sara K, Couturier R (2022) Energy-efficient fog computing-enabled data transmission protocol in tactile internet-based applications. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, pp 206–209
8. Hussein, AM; Idrees, AK; Couturier, R. Distributed energy-efficient data reduction approach based on prediction and compression to reduce data transmission in iot networks. Int J Commun Syst; 2022; 35,
9. Idrees Ali K, Jaoude CA, Al-Qurabat AKM (2020) Data Reduction and Cleaning Approach for Energy-Saving in Wireless Sensors Networks of iot. In: 2020 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp 1–6. IEEE
10. Al-Nassrawy KK, Idrees AK, Al-Shammary D (2022) A novel lossless eeg compression model using fractal combined with fixed-length encoding technique. In: AI and IoT for Sustainable Development in Emerging Countries, pp 439–454. Springer
11. Azar, J; Makhoul, A; Barhamgi, M; Couturier, R. An energy efficient iot data compression approach for edge machine learning. Futur Gener Comput Syst; 2019; 96, pp. 168-175. [DOI: https://dx.doi.org/10.1016/j.future.2019.02.005]
12. Idrees Sara K, Azar J, Couturier R, Idrees AK, Gechter F (2022) Lightweight sz4iot library. https://github.com/saraidrees/SZ4IoT
13. Del Testa, D; Rossi, M. Lightweight lossy compression of biometric patterns via denoising autoencoders. IEEE Signal Process Lett; 2015; 22,
14. Liu T, Wang J, Liu Q, Alibhai S, Lu T, He X (2021) High-ratio lossy compression: exploring the autoencoder to compress scientific data. IEEE Trans Big Data
15. Lindstrom, P. Fixed-rate compressed floating-point arrays. IEEE Trans Visual Comput Graphics; 2014; 20,
16. da Silva, Marcus de VD, Rocha A, Gomes RL, Nogueira M (2021) Lightweight Data Compression for Low Energy Consumption in Industrial Internet of Things. In: 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), pp 1–2. IEEE
17. Capo-Chichi EP, Guyennet H, Friedt J-M (2009) K-rle: A New Data Compression Algorithm for Wireless Sensor Network. In: 2009 third international conference on sensor technologies and applications, pp 502–507. IEEE
18. Lu T, Xia W, Zou X, Xia Q (2020) Adaptively compressing iot data on the resource-constrained edge. In HotEdge
19. Moon A, Kim J, Zhang J, Son SW (2017) Lossy Compression on iot Big Data by Exploiting Spatiotemporal Correlation. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp 1–7. IEEE
20. Correa, JDA; Pinto, ASR; Montez, C. Lossy data compression for iot sensors: a review. Internet of Things; 2022; 19, [DOI: https://dx.doi.org/10.1016/j.iot.2022.100516] 1104.54019 100516.
21. Wang C-M, Yen C-C, Yang W-Y, Wang J-S (2016) Tree-Structured Linear Approximation for Data Compression Over wsns. In: 2016 International Conference on Distributed Computing in Sensor Systems (DCOSS), pp 43–51. IEEE
22. Al Fallah S, Arioua M, El Oualkadi A, El Asri J (2018) On the Performance of Piecewise Linear Approximation Techniques in wsns. In: 2018 International Conference on Advanced Communication Technologies and Networking (CommNet), pp 1–6. IEEE
23. Pham Ngoc D, Le TD, Choo H (2008) Enhance exploring temporal correlation for data collection in wsns. In: 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, pp 204–208. IEEE
24. Schoellhammer T, Greenstein B, Osterweil E, Wimbrow M, Estrin D (2004) Lightweight Temporal Compression of Microclimate Datasets. In: 29th Annual IEEE Conference on Local Computer Networks( LCN), pp 516-524
25. Parker D, Stojanovic M, Yu C (2013) Exploiting Temporal and Spatial Correlation in Wireless Sensor Networks. In: 2013 Asilomar Conference on Signals, Systems and Computers, pp 442–446. IEEE
26. Sharma R (2015) A Data Compression Application for Wireless Sensor Networks Using ltc Algorithm. In: 2015 IEEE International Conference on Electro/Information Technology (EIT), pp 598–604. IEEE
27. Klus, L; Klus, R; Lohan, ES; Granell, C; Talvitie, J; Valkama, M; Nurmi, J. Direct lightweight temporal compression for wearable sensor data. IEEE Sens Lett; 2021; 5,
28. Di S, Cappello F (2016) Fast Error-Bounded Lossy hpc Data Compression with sz. In: 2016 IEEE international parallel and distributed processing symposium (ipdps), pp 730–739. IEEE
29. Azar, J; Makhoul, A; Couturier, R; Demerjian, J. Robust iot time series classification with data compression and deep learning. Neurocomputing; 2020; 398, pp. 222-234. [DOI: https://dx.doi.org/10.1016/j.neucom.2020.02.097] 1252.35212
30. Azar J, Tayeh GB, Makhoul A (2022) Raphaël Couturier. Efficient lossy compression for iot using sz and reconstruction with 1d u-net. Mobile Networks and Applications, pp 1–13
31. Hussein, AM; Idrees, AK; Couturier, R. A distributed prediction-compression-based mechanism for energy saving in iot networks. J Supercomput; 2023; 79,
32. Idrees, AK; Jawad, LW. Energy-efficient data processing protocol in edge-based iot networks. Ann Telecommun; 2023; 78,
33. Khlief MS, Idrees AK (2022) Efficient EEG Data Compression Technique for Internet of Health Things Networks. In: 2022 IEEE world conference on applied intelligence and computing (AIC), pp 403–409. IEEE
34. Lin, S; Lin, W; Keyi, W; Wang, S; Minxian, X; Wang, JZ. Cocv: a compression algorithm for time-series data with continuous constant values in iot-based monitoring systems. Internet Things; 2024; 25, [DOI: https://dx.doi.org/10.1016/j.iot.2023.101049] 07963518 101049.
35. Hasan BT, Idrees AK (2023) Edge computing for iot. In: Learning Techniques for the Internet of Things, pp 1–20. Springer
36. Kant, K; Jolfaei, A; Moessner, K. Iot systems for extreme environments. IEEE Internet Things J; 2024; 11,
37. Idrees, AK; Ali-Yahiya, T; Idrees, SK; Couturier, R. Edatad: energy-aware data transmission approach with decision-making for fog computing-based iot applications. J Netw Syst Manage; 2024; 32,
38. Sadri, AA; Rahmani, AM; Saberikamarposhti, M; Hosseinzadeh, M. Data reduction in fog computing and internet of things: a systematic literature survey. Internet Things; 2022; 20, [DOI: https://dx.doi.org/10.1016/j.iot.2022.100629] 1490.93060 100629.
39. Idrees, AK; Khlief, MS. Efficient compression technique for reducing transmitted eeg data without loss in iomt networks based on fog computing. J Supercomput; 2023; 79,
40. Espressif Systems (2023) Esp32-wroom-32 datasheet. https://www.espressif.com/sites/default/files/documentation/esp32-wroom-32_datasheet_en.pdf
41. PJRC. Teensy 4.0 development board. https://www.pjrc.com/store/teensy40.html
42. Raspberry Pi. Raspberry pi pico series documentation. https://www.raspberrypi.com/documentation/microcontrollers/pico-series.html
43. Andrzejak, RG; Lehnertz, K; Mormann, F; Rieke, C; David, P; Elger, CE. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys Rev E; 2001; 64,
44. Cesarelli, M; Ruffo, M; Romano, M; Bifulco, P. Simulation of foetal phonocardiographic recordings for testing of fhr extraction algorithms. Comput Methods Programs Biomed; 2012; 107,
45. Saeed, M; Villarroel, M; Reisner, AT; Clifford, G; Lehman, L-W; Moody, G; Heldt, T; Kyaw, TH; Moody, B; Mark, RG. Multiparameter intelligent monitoring in intensive care ii (mimic-ii): a public-access intensive care unit database. Crit Care Med; 2011; 39,
46. Dau, HA; Bagnall, A; Kamgar, K; Yeh, C-CM; Zhu, Y; Gharghabi, S; Ratanamahatana, CA; Keogh, E. The ucr time series archive. IEEE/CAA J Autom Sinica; 2019; 6,
47. Mitzenmacher M (2024) Introduction to data compression, 2024. Accessed: 2024-09-15
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.