Content area
Real-time multi-criteria decision-making applications in fields like high-speed algorithmic trading, emergency response, and disaster management have driven the development of new types of preference queries. This is an example of a skyline search. Multi-criteria decision-making utilizes the skyline operator to extract highly significant tuples or useful data points from extensive sets of multi-dimensional databases. The user’s settings determine the results, which include all tuples whose attribute vector remains undefeated by another tuple. The extracted tuples are commonly known as the skyline set. Lately, there has been a growing trend in research studies to perform skyline queries on data stream applications. These queries consist of extracting desired records from sliding windows and removing outdated records from incoming data sets that do not meet user requirements. The datasets in these applications are extremely large and exhibit a wide range of dimensions that vary over time. Consequently, the skyline query is considered a computationally demanding task, with the challenge of achieving a real-time response within an acceptable duration. We must transport and process enormous quantities of data. Traditional skyline algorithms have faced new challenges due to limitations in data transmission bandwidth and latency. The transfer of vast quantities of data would affect performance, power efficiency, and reliability. Consequently, it is imperative to make alterations to the computer paradigm. Parallel skyline queries have attracted the attention of both scholars and the business sector. The study of skyline queries has focused on sequential algorithms and parallel implementations for multicore processors, primarily due to their widespread use. While previous research has focused on sequential algorithms, there is a limitation to comprehensive studies that specifically address modern parallel processors. While numerous articles have been published regarding the parallelization of regular skyline queries, there is a limited amount of research dedicated specifically to the parallel processing of continuous skyline queries. This study introduces PRSS, a continuous skyline technique for multicore processors specifically designed for sliding window-based data streams. The efficacy of the proposed parallel implementation is demonstrated through tests conducted on both real-world and synthetic datasets, encompassing various point distributions, arrival rates, and window widths. The experimental results for a dataset characterized by a large number of dimensions and cardinality demonstrate significant acceleration.
Introduction
The use of digital technologies, such as smart sensors, is rapidly increasing in our world. These sensors gather and keep track of a lot of data generated by the environment, and they are subject to frequent changes [1]. There is a rising trend among various entities, such as organizations, academic institutions, engineering professions, and governments, to develop efficient management strategies to handle large volumes of data, commonly referred to as big data [2]. Industries such as civil aviation [3], sensor manufacturing, and IoT [4] will heavily depend on extracting significant patterns and information from vast quantities of high-speed streamed data. This field is rapidly growing and holds great promise [5].
The skyline query has emerged as a prominent technique in the fields of decision support systems and multi-criteria analysis [6]. Efficiently identifying the best solutions within extensive datasets that have multiple evaluation criteria is a crucial strategy. Skyline queries offer a method to efficiently explore and analyze extensive datasets, identifying the most competitive choices across various dimensions. Focusing on a specific portion of data that reflects the most favorable tradeoffs aids decision-makers in making wise choices. The utilization of IoT devices [4] serves as a prime illustration of this approach, as Skyline queries can assist in identifying IoT devices that provide the most favorable balance in terms of various factors such as energy efficiency, data accuracy, and communication reliability.
Continuous skyline queries are essential for extracting valuable insights from large multi-dimensional data streams and making optimal decisions based on user preferences. Additionally, they are highly effective and commonly employed for managing environmental data monitoring, as noted by [1]. Researchers have developed various concepts and techniques to analyze and manage continuous streaming data. However, these methods are performed sequentially and do not utilize the multiple cores of the processor [7].
Consider the example of finding the best IoT device by using skyline queries on the dataset in Table 1, which shows the performance metrics of five IoT devices, taking into account things like data accuracy, energy efficiency, and communication reliability, as shown in Fig. 1a. The skyline devices represent optimal devices that strike a balance between energy savings, accurate data acquisition, and reliable communication. For instance, a smart city planner tasked with deploying IoT devices for diverse applications can leverage skyline queries to choose devices that offer the most favorable trade-offs across multiple criteria, forming a transparent and easily interpretable subset of devices that offer optimal trade-offs.
Table 1. Sample datasets of IoT devices and its criteria
Device | Energy efficiency | Data accuracy | Communication reliability |
|---|---|---|---|
0.8 | 0.9 | 0.7 | |
0.7 | 0.8 | 0.8 | |
0.9 | 0.7 | 0.6 | |
0.6 | 0.9 | 0.9 | |
0.85 | 0.75 | 0.85 |
[See PDF for image]
Fig. 1
Skyline query usage for IoT devices selections
When there are a large number of IoT devices and significant amounts of data streams, effectively processing such data becomes a challenging task. Researchers are suggesting using multicore processors and creating parallel skyline queries to speed up computations on large data streams in order to deal with this problem [7]. Traditional skyline queries, despite their effectiveness, struggle to cope with continuously streaming data from Internet of Things (IoT) devices. As a solution, researchers [8] suggest implementing a continuous skyline query mechanism that operates within a sliding window. This mechanism will effectively adjust to real-time variations in device performance. The main objective of our study is to create and assess a Parallel Range Search Skyline (PRSS) algorithm specifically designed for multicore processors. This PRSS algorithm provides a scalable solution for conducting continuous skyline queries on high-dimensional data streams.
In the ensuing sections, we delve into the limitations of sequential skyline queries, highlighting their inefficiencies in handling large datasets and their failure to leverage multicore processors effectively [7]. The adoption of parallel skyline queries, coupled with multicore processors, emerges as a strategic response to the challenges associated with complex streaming datasets [9]. We offer an extensive review of current skyline queries, clarify the underlying methodology and the design principles of our parallel algorithm PRSS, and completely analyze the experimental results using both real and synthetic datasets. The conclusion summarizes our findings, which will serve as a foundation for future research in the constantly evolving field of skyline queries and data stream processing.
Background and related works
Within this section, we will initially introduce the symbols (Table 2) and explanations of the terminology employed throughout this document. This will be followed by a review of the relevant literature and related works to provide a comprehensive context for our study.
Table 2. Summary of notations
Notation | Meaning |
|---|---|
Input data stream for processing skyline | |
Number of dimensions of | |
A | index |
one dimensional index | |
Z | the size of streaming window (count and time based window) |
tuples in the streaming data | |
s | Skyline tuple |
dimensional value of the incoming tuple x on dimension i | |
Set of skyline tuples of | |
dominates | |
does not dominate | |
is incomparable to |
Skyline queries
Since Börzsönyi’s [10] development of the skyline operator, this approach to extracting interesting objects from multi-dimensional datasets has become a major issue in database research. The Skyline operator’s widespread effectiveness in fields as wide-ranging as quantitative economics, market research, environmental monitoring, data mining, and visualization is largely responsible for its enormous reputation. In such contexts, the database tuples are considered a set of multidimensional data points, and the skyline query determines the points that are the best choices between the different dimensions without using cumulative functions to identify the best results but instead based on the user’s preferences.
Definition 1
(Dominance [11]) Given x an dimensional tuple, let’s denote the value of the tuple x at the dimension i by x[i] where . let x and two tuples, denotes that x[i] is better than and , it means, , Denotes that is not worse than x[i]. Therefore, .
A tuple x dominates a tuple , represented by , iff for every dimension , we get , and at a minimum one dimension , we get . let x and two tuples, we represent such x does not dominate , and we represent namely x and are incomparable, it means, . We expand to sets of tuples, similarly, denotes that the tuple x dominates every tuple in denotes that the tuple x is incomparable with every tuple in X.
Table 3 shows a sample sliding window of 6 dimensions that includes 10 tuples , of which 5 are skyline tuples while the greater than order is applied to all dimensions.
Table 3. A sample database with , , and
Tuple ID | N cores | RAM (gigabytes) | Storage (gigabytes) | Camera (megapixels) | Battery (mAh) | Screen (pixels) | Skyline |
|---|---|---|---|---|---|---|---|
8 | 12 | 64 | 5 | 3000 | 500 | Yes | |
10 | 16 | 128 | 8 | 4000 | 600 | Yes | |
12 | 20 | 256 | 10 | 4500 | 700 | Yes | |
9 | 14 | 128 | 6 | 3500 | 550 | No | |
11 | 18 | 192 | 9 | 3800 | 650 | No | |
7 | 10 | 32 | 4 | 2500 | 450 | Yes | |
14 | 22 | 512 | 12 | 5000 | 800 | Yes | |
6 | 8 | 16 | 3 | 2000 | 400 | No | |
13 | 24 | 384 | 11 | 4800 | 750 | No | |
8 | 15 | 256 | 7 | 3700 | 600 | No |
Example 1
from the set of 10 tuples in Table 3, , and , and no tuple dominates , and does not dominate any tuples. Therefore the Skyline is .
Definition 2
(Icomparability [11]) Given two dimensional tuples , it is said that x and are incomparable on , that is, if x and do not dominate each other simultaneously. denoted as , iff and . This property facilitates checking if one or more tuples can be Skyline tuples. A tuple in the skyline set needs to be incomparable to all other tuples in the set.
Definition 3
(Continuous Skyline [11]) Given a dimensional continuous dataset, a tuple is a skyline tuple iff where . The skyline of continuous dataset is the total set of skyline tuples such that, .
The Skyline algorithms have proven to be powerful queries in related database development studies after being the subject of extensive study and different applications such as environment monitoring [12], IoT [4], Aviation industry [13, 14] Social Media analisys [15]. Generally, the introduced skyline techniques can be grouped into algorithms that extract the skyline of a dataset that can be stored in main memory(in-core algorithms) and algorithms that use secondary memory(out-of-core) [16].
Main-memory computation (in-core skyline queries)
queries to identify the skyline when the dataset can be contained in main memory. Main memory algorithms, also known as in-core algorithms, are those developed specifically for processing data that can be loaded entirely within a machine’s main memory simultaneously such as the brute-force BF skyline and the Sort-based SB skyline algorithms presented in [16]. Obviously, query processing can be done more efficiently if the entire data set can be stored in main memory [17]. But generally, the dataset is much bigger than the volume of main memory that is commonly available [18]. Therefore, effective methods are needed to obtain the skyline for data stored on disk [10].
Algorithms for secondary memory (out of core skyline queries)
Although it is preferable to launch skyline processing in main memory, there are cases in which this is not practical [18]. Since modern applications typically utilize very sizable datasets, such datasets would require more storage than is available in the machine’s main memory. Because of this, there is a need for algorithmic ways to compute skylines that can be used on data sets that are stored on disk [10]. The primary solution does not require any unique indexing methods. Many datasets, for example, are collected and stored without ever being indexed. The second solution is the use of methods that rely on a particular indexing mechanism for points in multiple dimensions [16]. In fact, indexes are generally widely accessible and can be employed to speed up skyline computation [19].
Index-free techniques: Methods that do not require to index the set of tuples are called “index-free” BNL and [10], Iskyline [20], VP and KISB [21], and Object Space Partitioning (OSP) [22]. As a result, their cost will exceed that of index-based methods. Such algorithms suffer from the problem of high-cost computations, such as presorting or point-to-point comparisons, which needs to be fixed [16].
Index-based techniques: you can quickly get to the data you need using an index without having to search through data that’s not related to your objective. It is possible that the running time of these skyline queries could be significantly accelerated by utilizing an index on particular attributes [19]. Algorithms using an [23, 24], Btree [25, 26, 27–28] SkyMap a trie-based index [29]. Index-basesd techniques will face challenges of availability for only a specific data type or the cost of the indexing structure that will outweigh its advantages. There are also dominance relation-based skyline query techniques that decrease comparisons between tuples, control the skyline with tree or lattice structures, and utilize the incomparability to skip unnecessary comparison tests BSkyTree [30] and BJRtree [31].
Advanced skyline processing
Parallel-distributed approaches are also created to handle the enormous size of the dataset [7, 32]. The use of multiple resources (multicore processors) and precise management of insertions and deletions are critical factors in achieving scalability in skyline computation [33]. Parallel, distributed, and continuous data set algorithms can be used to enable skyline computation on large and complex datasets. These methods provide the necessary speedup in situations where processing efficiency cannot be guaranteed by single-processor algorithms [7]. Although there are massive benefits to using skyline processing, one challenge in particular must be taken into account: the number of skyline tuples cannot be regulated. In reality, the number of skyline tuples is determined by the number of attributes in the data and how the data is distributed. Several options have been proposed to regulate the outcome. Some of them simply add an additional criterion to the skyline result, while others modify the definition of dominance or use alternative methods to prune the result set [16].
Distributed and parallel techniques
In order to process queries more quickly, we need to create new methods benefiting from parallel computation, which is practical with multicore processors to boost performance [7]. To resolve this, sequential code must be rewritten as parallel code that runs on multiple processor cores simultaneously to boost performance without changing the algorithm’s objective or characteristics [34]. The fundamental idea behind parallelizing Skyline methods is to maximize efficiency by making full use of all of the processor’s cores all at once, so that the optimal decision can be made as quickly as possible [35, 36–37]. However, while there are many references on skyline queries, there is remarkably little study of employing multi-core processors for skyline computation, according to our previous survey [7]. To ensure scalable query processing, supplying additional resources is an ideal solution. Parallel or distributed methods may be very useful for skyline queries [9, 38, 39–40].
Making use of multiple resources is one way to ensure scalable query processing. Skyline queries, like many others, may benefit significantly from parallel or distributed processing. Skyline queries can be efficiently executed in distributed environments using the index-based algorithms presented in Balke et al. [25] and Lo et al. [26]. The work of Wu et al. [41] is another look into the problem of using content-based data partitioning to run Skyline queries in parallel on many computers. The proposed algorithm, named DSL for distributed skyline query processing, finds the skyline points incrementally.
Gao et al. [42] introduced a novel method for parallel computation of skyline queries in multi-disk environments, leveraging parallel R-trees to enhance efficiency. Their approach emphasizes concurrent access to entries across multiple disks and employs effective pruning techniques to optimize query performance. The research investigates the challenges of skyline queries in such environments, where objects are distributed across disks and accessed in parallel. Their study explores parallel skyline computation using a multi-disk architecture with a single processor. The primary objective is to enhance entry access from multiple disks simultaneously, improving non-qualifying point pruning. The focus lies on optimizing the distribution of parallel RTree nodes and data space partitioning within a parallel share-nothing architecture. In this context, Gao et al. developed specialized parallel algorithms, including Basic Parallel Skyline (BPS) and Improved Parallel Skyline (IPS), which utilize parallel Rtrees across multiple heaps corresponding to the number of disks. To further improve efficiency, the IPS algorithm incorporates a pruning strategy based on dominating regions. Additionally, the paper discusses adaptations of existing algorithms for parallel skyline computation in shared-anything architectures. A case study examines skyline retrieval with distributed multidimensional points stored across multiple disks.
Wang et al. [43] investigate the use of P2P networks for processing skyline queries. To better regulate peer access and search messages when processing skyline queries, the authors proposed a method called SSP that partitions and numbers the data space between the peer nodes. Later, authors [44] presented the SKYFRAME method, a generalization of the SSP method that allows skyline processing to be performed without identifying the starting peer.
Vlachou et al. [45] propose SKYPEER, a threshold-based algorithm for efficient subspace skyline processing in a P2P environment. SKYPEER reduces the amount of data sent over the network connecting the peers.
Since it is nearly impossible to ensure full and accurate query answers without an exhaustive search, Hose [45] presents a study on the effective processing of skyline queries in large-scale P2P systems. To reduce the load on nodes, approximate algorithms backed by probabilistic guarantees are proposed. When obtaining precise answers to skyline queries is prohibitively expensive, approximate algorithms are proposed as an alternative, as suggested in Li et al. [46]. The proposed algorithms use heuristics based on the local semantics of peer nodes, which yield high-quality results. In addition, Hose and Vlachou [32] present a comprehensive review of skyline processing in highly distributed environments.
Skyline query processing methods in mobile ad-hoc networks (MANETs) are examined in Huang et al. [47], with the primary goal of minimizing communication overhead between nodes and maximizing device throughput. Li and Xiong [48] investigate query processing and optimization strategies for WSNs. Later they proposed the SKY-SEARCH algorithm, which efficiently calculates the skyline with the highest existential probability.
Vlachou et al. [49] first proposed and designed to operate in shared-nothing architectures. A master node that serves as the coordinator and a group of worker nodes constitute the abstract architecture. The coordinator’s job is to accept requests and assign tasks to the related workers.
The dataset of interest P, is made up of a set of d-dimensional points that must be partitioned across N servers. Where , each partition contains some subset of the points in P. For effective parallel processing, a user-friendly data partitioning scheme is required. There are three stages to the algorithm:
Partitioning Phase: The data is partitioned among all of the available worker nodes. Local Processing Phase: Based just on local data, each worker calculates the skyline. Merging Phase: Based on the local results that the workers produced in the earlier phase, the coordinator calculates the final skyline result.
Zhang et al. [22] utilize an object-based space partitioning methodology to execute k-dominant skyline queries in parallel. The object-based space partitioning (OSP) technique entails the recursive subdivision of d-dimensional space into regions relative to a skyline object, thereby avoiding point comparisons between entire regions and thereby minimizing the number of operations required. The utilization of spatial dominance relationships by the OSP scheme can effectively decrease the quantity of subspaces that require processing due to the inherent inability of skyline objects to be dominated by other objects. The strategies commonly employed involve recursive techniques, such as partitioning based on pivot points, for the purpose of arranging the identified skyline points into a skyline tree. The optimization of both dominance and incomparability relationships was further carried out by OSPS.
Skyline in data stream environments
Most real-world use cases involve datasets that are highly dynamic, meaning that objects may be added or removed instantly. Different queries were designed to tackle the data stream as classified in Fig. 2:
[See PDF for image]
Fig. 2
Taxonomy of continuous skyline query processing over data stream
The primary challenge in identifying a continuous skyline involves determining and updating skyline results within a specific timeframe (a time-based window) or after receiving a specific number of tuples (count-based). In a time-based window, all records have an equal validity time of size , which begins when the record is joined to the dataset . Given that each record x has a time interval that starts with its arrival time and ends with its expiration time and removal from the dataset, this time interval is the validity time of each record, written as . Each time, the type-count-based windows have a number of the most recently arrived tuples. The equation for a point x’s arrival and departure times is , where is the size of the sliding window. When a window of size for a recently entered tuple arrives, no matter what time it arrives, the entered tuple is no longer valid. Taking previous continuous skyline characteristics, we proposed a parallel continuous technique called PRSS to use the parallelism of multicore processors to find quickly.
Historically, Papadias et al.’s [23] continuous skyline query is among the earliest and most widely used variants of the classical skyline query. Processing algorithms for the classical skyline query assume static attribute values for database objects; processing algorithms for the continuous skyline query calculate the attributes of the data objects “on-the-fly” during the execution of their algorithmic steps.
Papadias et al. [23] proposed an efficient algorithm to process the continuous skyline query, which is an extension of the well-known BBS algorithm for handling the static skyline query given an indexed dataset using the Rtree. The only difference between this and the BBS algorithm is that the R-tree entries are now added into the heap and ordered based on their mindist distance to the query point, which mindist is determined on-the-fly when the entry is considered for the first time (recall that in the classical BBS algorithm, the Rtree entries are inserted into the heap and ordered according to their distance to the origin point of the space).
Yu et al. [50] proposed a window-based algorithm for skyline queries, which splits skyline queries into a large number of continuous window queries. Tao and Papadias [8] investigated skyline processing on sliding window, proposing algorithms that continuously monitor data entry and perform incremental skyline maintenance. In Morse et al. [51], the authors proposed a skyline algorithm that works well over a continuous time interval. In Huang et al. [52], they proposed using a kinetic-based data structure to simplify the processing of skyline queries for mobile objects.
To safely prune parts of the dataset and speed up the execution of any future queries, Sacharidis et al. [53] propose a cache-aware algorithm. On top of that, Han et al. [54] recently developed a strategy to efficiently process the continuous skyline query over massive data by first retrieving the tuples in continuous sorted lists in a round-robin fashion till an early termination condition is satisfied and then computing the continuous skyline results.
Several recent research efforts, such as Zeighami et al. [55] and Wang et al. [56], have introduced frameworks to address the demand for a secure skyline calculation in encrypted datasets for services in which dealing with sensitive data is an essential concern, such as healthcare or data outsourcing.
Balke et al. [25] adapted the skyline problem for the Web, a process in which the attributes of an object are split across multiple Web-accessible servers (various dimensions are stored at various data sites). In such a distributed environment, they proposed two algorithms Basic distributed skyline (BDS) and improved distributed skyline (IDS) algorithms that determine the skyline and return all skyline objects at once.
The BDS algorithm uses a round-robin pattern to retrieve attribute values using sorted access on each site. Until it has successfully retrieved all of the attribute values of a random point (called the terminating object), the algorithm will continue in the same way. Pruning can occur at points that are not “seen” during this procedure if they do not dominate the terminating object. Once at least one dimension has been retrieved for a given point, the central site requests all attribute values for that point from each site. The algorithm then removes the outlying points, obtaining the final skyline.
In order to reduce the number of visits to the lists, the IDS algorithm is an enhanced version of BDS that does not access the list in a round-robin fashion but instead continually accesses the most promising list in regard to earlier terminating object identification.
In [57], authors studied “thick skylines” which suggest not only skyline objects but also their nearby neighbors within a predefined threshold distance. Sampling-and-pruning and indexing-and-estimating are two effective algorithms that are suggested for locating such thick skylines. For mining thick skylines in three different use cases, the authors proposed three algorithms: sampling and pruning, indexing and estimation, and micro-cluster-based algorithms.
Tao and Papadias [8] introduced the Eager algorithm for the sliding window data stream model, which effectively resolves the issue through the implementation of lazy updates and pre-computation techniques. The preservation of objects is achieved through the utilization of a list in both LLEager and LLLazy algorithms.
Two strategies for skyline computation are commonly used: the lazy strategy, which holds off most computations until a point’s expiration, and the eager strategy, which involves a pre-computation phase to minimize storage costs and only retain points that may be part of the skyline in the future. The initial approach involves the management of these instances through the employment of the preprocessing module (LPM) and the maintenance module (LMM), respectively. Upon the arrival of a novel tuple r, the algorithm conducts an assessment to determine whether it is dominated by any pre-existing point within the database skyline set, denoted as DBsky, that includes all the skyline points. Items that are not dominant are assigned to the DBsky set, while those that are dominant are assigned to the DBrest set. It is worth noting that the placement of the point in the DBrest set is due to the possibility of it being identified as a skyline point in the future, in cases where another point has expired and cannot be pruned entirely. The insertion of a tuple into DBsky has the potential to establish dominance over pre-existing points within the skyline. Points that have not yet expired are pruned if their expiration time is earlier than the points that dominate them.
Xin et al. [58] introduced an algorithm called SWSMA (Sliding Window Skyline Monitoring Algorithm) that utilizes a filter-based approach to compute the skyline in sensor networks. The SWSMA algorithm has been proposed as a means of achieving energy efficiency in the maintenance of sliding windows over wireless sensor networks. This algorithm utilizes two distinct filters within each sensor node with the aim of reducing the volume of data that is transmitted and thereby conserving energy. By reducing the amount of data that is transferred between sensor nodes, SWSMA is able to achieve significant energy savings.
Morse et al. [51] introduced the LookOut algorithm as a solution to the real-time skyline query problem on non-static datasets with validity issues in the time interval of data items. The algorithm utilizes a quadtree index structure to support continuous skyline queries. Data that holds a validity time interval refers to a specific data element within a dataset that is associated with a valid time interval, as determined by the data item’s validity time. The utilization of time-based windows was initially implemented in the LookOut algorithm for the purpose of conducting continuous skyline queries. The absence of any pruning strategy in this solution results in a significant increase in memory usage.
Sarkas et al. [59] looked into the categorical skylines in streaming environments and came up with some novel techniques for keeping the categorical skyline maintained. Skyline query processing involves categorical data in partially ordered domains. In order to obtain and keep track of the skyline set, they build an index structure on a grid based on topological sorting.
Constrained bandwidth Skyline queries in mobile environments are the subject of [60] research. The authors assume an ad hoc network in which each wireless device calculates its own local skyline using its own dataset and reports its findings to a central coordinator, who handles the merging process and calculates the final skyline result. Each device only reports a portion of its local skyline set to meet the bandwidth constraint, producing an approximation of the final skyline set.
In [61], the authors propose a distributed skyline (FDS) algorithm for geographically dispersed servers on the Internet that relies on feedback and makes no assumptions about the underlying overlay network. In order to calculate the skyline, the algorithm must make multiple trips. When a server communicates with other servers directly, it often uses too much bandwidth because its goal is to calculate the precise skyline. The authors also cover dealing with point deletion and addition. A group of local servers contains the underlying data set, which is partitioned horizontally. FDS uses a multi-round feedback mechanism to eliminate potentially unqualified skyline candidates in an effort to conserve network bandwidth. The delay (in terms of query response time) of FDS grows significantly as the size of the network grows.
Sun et al. [24] investigate the optimization of monitoring skyline queries on distributed data streams. They significantly contribute to the subject by developing and evaluating a novel algorithm named BOCS (Based on Changes of Skylines). This technique uses the GridSky approach as its foundation and incorporates it into a distributed data stream architecture to address the challenges of real-time skyline query processing. Sun et al. introduce the GridSky algorithm, which is a tree-based indexing method specifically developed for distributed data streams in a master–slave computing cluster. In this configuration, slave nodes perform calculations on their own subsets of data to determine local skylines. A central master node receives these local skylines and incorporates them into the overall skyline through incremental updates. The main benefit of GridSky is its ability to eliminate unnecessary data at both slave and master nodes, resulting in a decrease in communication overhead between them. The BOCS algorithm expands upon GridSky by providing a comprehensive framework for the distribution of data streams. BOCS involves the production of data objects by numerous servers, with a central server responsible for evaluating queries by combining data from different servers. GridSky employs a system where each remote server stores a local skyline and only sends updates (delta skylines) to the central server at regular intervals. This strategy exploits the skyline operator’s additive property to minimize the amount of communication required, transmitting only the data points that affect the local skylines. The efficacy of BOCS in effectively managing the dynamic nature of data streams is evident. The BOCS system ensures that the central server receives only the essential data updates. The system achieves this by maintaining local buffers at remote locations and computing the changes in the data skyline with each timestamp update. The central server combines these increments to update the overall skyline. This approach not only reduces connectivity costs but also improves the overall efficiency of skyline query processing in distributed contexts. This category of skyline operators using time-based windows demonstrates the BOCS algorithm’s resilience. BOCS tackles the important problem of monitoring skylines across time, rather than just calculating them at certain points in time, by emphasizing incremental updates and efficient communication methods. The capacity to continuously monitor is crucial for applications that necessitate real-time data processing and analysis. Ultimately, the study offers a well-crafted resolution to the issue of handling skyline queries over distant data streams. The integration of GridSky into the BOCS framework significantly reduces communication overhead and enhances query processing efficiency. This study makes a substantial contribution to the area by offering a scalable and efficient method for organizing and querying large-scale distributed data streams in real-time.
Das Sarma et al. [13] introduce a set of novel algorithms that efficiently calculate skylines on data streams. These techniques overcome the limits of single-pass algorithms in the sliding window paradigm by providing worst-case performance guarantees. The authors contend that one-pass algorithms are very limiting and demonstrate the infeasibility of devising an effective skyline technique that handles each tuple only once. Das Sarma, Lall, Nanongkai, and Xu were the first to formally investigate the skyline problem in the multi-pass streaming model. The authors emphasize that the straightforward O(nh)time output-sensitive technique, which necessitates O(h) space and O(h) passes (where h is the number of skyline points), may be wasteful. To address this issue, they propose a novel randomized algorithm that requires significantly fewer iterations, specifically O(hlogn) space and O(logn) iterations with a high probability for any constant dimension d. Sarma et al. [13] specifically developed the output-sensitive randomized RAND algorithm to effectively calculate skylines. It achieves this by iteratively reducing the dataset in a number of steps. Every iteration involves conducting three scans of the current collection of points P. In the first scan, authors randomly selected a sample set S with a size of M. The algorithm replaces certain samples in S with points with greater pruning capability in the second scan, ensuring that all samples at the end of this scan are part of the skyline and potentially eliminated from P. The final scan eliminates points that any given sample primarily influences. This process continues until P is empty. RAND is highly efficient when the skyline is small, with a predicted time complexity of I/Os, where represents the number of skyline points. Sarma et al. present a thorough investigation where they put forward a deterministic approach for 2-dimensional space. This algorithm operates in I/Os, provided that the memory holds a minimum of words. The RAND method, which utilizes a randomized multi-pass approach, consists of three distinct phases per pass. Each phase requires three separate reads of the input file. In the initial stage, reservoir sampling is employed to fill a window of the largest possible size with input items that are randomly selected (In terms of objects, M is the memory size, and B is the block size). The second step involves processing the input file block by block. During this process, any window objects that are dominated are replaced with new objects. This ensures that all window objects in the file are skyline objects. In the third phase, it conducts inverse one-way dominance checks to exclude input objects that either dominate or correspond to the same item in the window. it then uses the remaining objects as input for the next run. In their paper, Das Sarma et al. did a thorough study of multi-pass streaming skyline computation. This provided us with a solid foundation for making Skyline algorithms work better and handle larger amounts of data. To address the challenges posed by streaming data, randomized approaches and iterative pruning processes provide practical solutions and verifiable performance guarantees.
Vlachou et al. [62] created SKYPEER to cut down on the amount of data that needs to be sent. It uses a threshold method to handle subspace skyline queries in a distributed environment, sending requests for these queries among peers. In order to reduce the amount of data transferred, SKYPEER converts the multi-dimensional data into one-dimensional values and applies a thresholding scheme. Expanding on its predecessor, prioritizes the efficient routing of skyline queries across the super-peer network in an effort to cut down on the total number of super-peers contacted.
In [63], the SkyPlan technique was introduced as a means of enhancing the operational efficiency of PaDSkyline [64]. Similar to the PaDSkyline approach, during query processing, individual peers provide the querying peer with a collection of minimum bounding rectangles (MBRs) that serve as a simplified version of their respective data. The SkyPlan system activities address the task of generating execution plans, which establish the order in which to execute a given query. The query plan directly affects the speed of Skyline query processing. Consecutive peer querying allows us to avoid contacting certain peers if a locally stored point dominates all their points.
The skyline computation technique for multiple queries, known as FAST, was introduced in [65]. This method is designed to process multiple continuous skyline queries across a data stream. The FAST algorithm employs a filtering mechanism to promptly eliminate an object that lacks an opportunity to belong to any coming skyline of continuous queries. Additionally, it utilizes a discriminant to effectively identify the skyline objects for which queries are applicable from the objects stored in memory.
The study authors, Lee et al. [66], have proposed a novel form of categorical skyline queries that allows users to progressively refine their preferences through an interactive preference elicitation framework. Researchers have found that this approach enhances the size of the reduced skyline. In order to reduce the skyline’s size, authors employ the strategy of considering a reduced number of dimensions in dominance tests.
The authors, Xia et al. [67] introduced a novel data structure, termed “compressed skycube” (CSC), to facilitate the execution of concurrent and unexpected subspace skyline queries in databases that receive regular updates. The fundamental concept entails a relationship between the minimum subspaces X, based on set inclusion, and each tuple t, such that it is a member of Sky(X). The evaluation of a Sky(X) query involves the initial computation of the union of sets of tuples, where each tuple t is associated with Y being a subset of X. then it submits the obtained set of tuples to an evaluation using a standard skyline procedure. The CSC methodology utilizes a presented incremental maintenance procedure.
Lee and Hwang [68] used a region-level lattice to determine the cost-optimal pivot point. The BSkyTree algorithm that Lee and Hwang proposed takes into account both dominance and incomparability. This approach effectively bypasses any unnecessary comparisons. Lee and Hwang have proposed two instantiations, namely BSkyTreeS and BSkyTreeP. These instantiations can be regarded as optimized algorithms in sorting- and partitioning-based schemes, respectively. To optimize both dominance and incomparability relationships, BSkyTree used point-based space partitioning.
Skyline queries are becoming increasingly popular for facilitating multi-criteria analysis in large datasets. The computation of a multidimensional subspace skyline was at the center of Lee and Hwang’s [69] study. The authors proposed a method to reduce the full-space skyline, thereby enabling users to take into account the subspace skyline that aligns with their preferences. The authors have identified that there is potential for additional enhancements and have suggested a more effective algorithm based on space partitioning for the computation of Skycube, which they have named QSkycube. The focus of Skycube and QSkycube is to develop efficient methods for calculating multiple subspace skyline queries. The Lattice and Skycube data structures are utilized for the purpose of managing complexity. Using a tree-based structure, the QSkyCube algorithm employs a hierarchical approach to generate the skyline for individual cuboids, or subsets of dimensions, through the utilization of a tree-based structure. Nevertheless, the size of the lattice may prove excessive when attempting to construct a skycube.
Skyline queries become more challenging when the required attributes are distributed across multiple tables that must be joined together first. This scenario is common in real-world applications, particularly in financial data analysis. Bai et al. [70] emphasize the main obstacles in handling skyline-join queries in distributed databases, including the need for effective filtering of unfavorable tuples and the achievement of optimal parallelism without causing a bottleneck node. Conventional approaches frequently struggle to effectively manage these tasks, especially when confronted with substantial amounts of data distributed across several nodes. DSJQ tackles these problems by employing a two-phase strategy: DSJQ candidate set selection and DSJQ skyline computation. During the initial stage, two filtering techniques are employed: group filtering and bloom filtering. Group filtering is a computationally efficient operation that is performed locally, while bloom filtering is designed to reduce data transmission and prevent the master node from becoming a bottleneck, even if it involves communication between nodes. In the second phase, the skyline of the linked tables is computed. DSJQ utilizes a rotation-based scheduling scheme to evenly divide the computational burden among all nodes, effectively preventing the bottleneck issue sometimes found in other approaches. This approach guarantees that the algorithm can handle larger data sets and a greater number of nodes without any decrease in efficiency.
Most WSN skyline query algorithms use a pruning strategy [71] to decrease the energy cost of data transmission between nodes. For multidimensional sensing data, Wang et al. [71] offer an energy-efficient skyline query method, and they also introduce several effective data reduction techniques like the dynamic filter and the tuple-cutting strategy, among others. Additionally, such an algorithm is only useful for handling skyline queries and cannot be used for regaining access to original sensory datasets. The technique employs a node cut strategy to dynamically produce filtering tuples rather than issuing filtered queries in order to collect query results.
A novel technique, HashSkyline [72], has been introduced for executing parallel skyline queries on high-dimensional datasets utilizing GPUs, with a focus on correlation awareness. The HashSkyline algorithm employs a technique based on hashing to enhance the efficiency of skyline queries for datasets that exhibit correlation.
Han et al. [54] have proposed a method for processing continuous skyline queries on large datasets. Their method entails retrieving tuples from dynamically sorted lists in a round-robin fashion up until an early termination criterion is satisfied, then computing the continuous skyline results.
Huang et al. [73] made improvements to the CKNNSQ algorithm proposed in [74] to enable its implementation in a dynamic road network. The improved algorithm utilized a combination of three data structures to facilitate the efficient updating of Knearest skyline objects (KNSOs).
In the work of [75], researchers have implemented the lattice of cuboids algorithm as a means of minimizing the computational costs associated with conducting skyline queries on data streams with high dimensions. However, the time complexity of these methods tends to increase as the number of dimensions increases. The present study aims to effectively calculate skyline queries featuring multiple attraction points across more than two dimensions while maintaining a predetermined upper limit on time complexity.
To address the challenge of executing skyline queries over incomplete data streams, Ren et al. [76] introduced an efficient algorithm (SkyiDS). The algorithm utilizes several advanced techniques, such as Differential Dependency (DD) Rules to impute missing attributes in objects within the incomplete data stream, effective pruning strategies like spatial pruning, max-corner pruning, and min-corner pruning to significantly reduce the search space of the SkyiDS problem, and cost-model-based index structures like data synopses and skyline tree (ST) indexes, which support data imputation through DD rules and the continuous maintenance of SkyiDS candidates for skyline computation. Additionally, SkyiDS constructs an index structure (Ij) over a complete data repository (R) to facilitate the imputation of missing data. The algorithm uses both the data synopsis ST and indexes Ij to keep an eye on SkyiDS query results from the incomplete data stream. This makes imputation and query processing work together more efficiently.
Group skyline computation
Im et al. [17] conducted a study on the group skyline query, which relies on the dominance relationship between groups of equal size. The evaluation of the dominance relationship is conducted based on the collective attributes’ values. The authors found the dominance relationship between groups by using the aggregate points in the conventional way, with a single aggregate point representing each group. The aggregate point is the result of applying an aggregate function to the attribute values of each individual point in the set. However, only a small subset of aggregate functions, such as SUM, MIN, and MAX, have been discussed in prior works for calculating aggregate points. To aggregate the points, [17] employed the SUM function, which intuitively represents the total sum of a group’s individual strengths. Due to the inability to capture all Pareto optimal groups, the skyline groups constructed using these methods are incomplete. Using the same attribute values of k points to form a group, [17] defined and studied the group skyline query and then compared the dominance relation between the groups using conventional dominance. Some aggregate functions, like SUM, were the most popular among the calculate functions used in these tasks. In practice, it can be challenging to determine which aggregate function is best.
The design of Guo et al. [77] addresses the intricacies of dynamic data streams, where objects constantly enter and exit the system, necessitating constant updates to query results. Previous research had only looked at skyline queries for static datasets or single objects in streams. This paper expands the problem to item groups, which makes it more difficult because of how group formation and dominance checks work. The authors propose the use of data structures like hash tables, dominance graphs, and matrices to store dominance information and reuse it efficiently. This reuse of dominance information is crucial for reducing the computational overhead associated with frequent updates required by the dynamic nature of data streams. By incrementally updating candidate objects and skyline groups, the proposed algorithms avoid the need for complete recomputation with each new data point, which significantly enhances performance. However, the paper also has some limitations. The focus on synthetic datasets for experimental validation, while useful for controlled comparisons, may not fully capture the variability and complexity of real-world data streams. Also, the paper mostly looks at the sliding window model for data streams. Despite its widespread use, this model may not be suitable for all types of streaming data applications, particularly those requiring complex windowing mechanisms or handling rapidly incoming large amounts of data. Despite being an interesting way to manage candidate objects, the paper doesn’t fully explore how to use and scale up the proposed dominance graph in very large and complex data streams.
Yang et al. [78] investigated flexible group skyline queries, which return a group of points based on group dominance. Their paper work introduces significant advancements by incorporating user preferences into Flexible Group Skyline (FGSky) queries, enhancing computational efficiency and user interactivity over traditional methods. The paper addresses the limitations of traditional group skyline queries by proposing FGSky queries, which directly consider user preferences, making the results more practical and tailored. It introduces novel algorithms and pruning strategies to enhance query performance and presents comprehensive experiments demonstrating the efficacy of these methods. By integrating user preferences, FGSky queries extend traditional group skyline queries, providing more relevant and tailored results. When processing FGSky queries, proposed pruning strategies reduce the search space, eliminating irrelevant data points early in the computation process. These algorithms [78] quickly deliver intermediate results, proving advantageous for interactive applications that require users to view partial results prior to the full computation. This improves the user experience and allows for more dynamic decision-making. Algorithms such as Extended GMDS (EGM) and Layered Minimum Dominance Graph (LMDG) build on existing methods by integrating user preferences and optimizing performance. The GCQ and LCQ algorithms generate candidate groups and refine them based on user preferences, demonstrating improved efficiency through comprehensive experimental validation.
Spatial skyline queries
Given a set P of data points and a set Q of query points, the spatial skyline query returns those points of P that are not spatially dominated by any other point of P. The Euclidean distance between the data objects and all query points is used to determine the spatial dominance [79].
As far as we know, the first people to investigate the spatial skyline query analysis problem were Sharifzadeh and Shahabi [79]. Specifically, they proposed two index-based algorithms-the (branch and bound spatial skyline) algorithm and the (Voronoi-based spatial skyline) algorithm-to process this query quickly and effectively. uses a top-to-bottom traversal of the conventional Rtree spatial index to look for candidate points along the spatial skyline. After identifying a candidate point on the spatial skyline, uses the ’s expansion to reach the node with the smallest mindist distance to the identified data point and then verifies the node’s dominance relative to the other candidate points. Because of the high computational cost, Son et al. [80] modified by performing fewer spatial dominance tests.
Wang et al. [81] noted that due to the exponential growth of spatial data, it is no longer feasible to use a single processing node to answer the spatial skyline query on massive spatial datasets. To deal with the typical spatial skyline query on massive datasets, they developed a state-of-the-art parallel framework based on Map Reduce. The method begins by determining the convex hull of the query points. The authors then suggested the idea of independent regions, which they use to classify input data points into distinct groups. In the end, it calculates the local spatial skyline in each autonomous region simultaneously and build the global spatial skyline from the union of all the local spatial skyline sets.
Huang [82] created the continuous dSkyline query by combining two delicate data structures-the object attribute control matrix and the road distance sorting list-to deal with the problem of attributes changing over time in the road network. This made it possible to do location-based queries for objects whose attributes change over time.
In recent years, the spatial skyline problem has expanded in various dimensions, including but not limited to uncertainty and semantics. Elmi and Min [83] have devised effective methodologies for computing the spatial skyline over imprecise data regarding a collection of query points situated in diverse locales. The utilized data structures are R-tree and Voronoi diagrams.
Siddique et al. [84] present an innovative approach to the computation of k-dominant skylines in spatially distributed systems using multicore processors. This method addresses the limitations of centralized skyline computation, which struggles with scalability and single points of failure. By distributing the kdominant skyline computation across multiple cores, the authors demonstrate a significant improvement in processing efficiency and scalability. The authors extend k-dominant skyline computation from centralized to distributed datasets, effectively balancing the computational load among multiple cores. This load balancing is critical, as the experiments indicate that high-power cores take less time compared to lower-power ones, underscoring the importance of efficient resource utilization. The response time improves with the increase in the number of cores, and the algorithm handles large datasets effectively, proving the method’s scalability and robustness. One of the main drawbacks of the current implementation is that it only allows each computing core to handle one dataset at a time. This constraint limits the method’s efficiency when dealing with multiple datasets concurrently, a scenario common in real-world applications. The authors acknowledge this limitation and suggest it as a direction for future research. Another critique lies in the assumption of homogeneous computing environments in some of the experiments. While the paper does include heterogeneous simulations, the primary focus on homogeneous environments may not fully capture the complexities and variances encountered in practical, real-world distributed systems. Moreover, the communication overhead between cores, especially in heterogeneous environments, is not deeply explored, which could be a significant factor affecting the overall performance.
Wenly et al. [81] present a parallel solution for the NSSQ problem utilizing the MapReduce technique. The authors employ a grid-based partitioning scheme to recursively divide the dimensions of the data into multiple parts. The framework presented applies to the utilization of MapReduce in the computation of independent regions following the generation of a query’s convex hull. The ultimate skyline is the combination of all the regions that constitute the skyline.
The spatial-GPU [85] methodology employs a multi-level approach to separate autonomous regions for the purpose of examining potential data points. Skyline queries are evaluated concurrently in various autonomous regions. Hence, this approach has the capability to operate concurrently with GPUs or MapReduce. This methodology has been specifically developed for a specific kind of skyline query known as spatial skylines. The solution’s applicability to general skyline queries is not readily extensible.
Chen and Lian [86] proposed the metric skyline query as an extension of the spatial skyline query. This query involves computing the dynamic attributes of each data object using a set of dimension functions. The dissimilarity between spatial and metric skyline queries originates mainly from the utilization of diverse distance functions which encompasses not only the Euclidean distance function but also other metric functions.
Fuhry et al. [87] identified noteworthy correlations in the metric space between the skyline query and other commonly used similarity queries, including the nearest neighbor and range queries. The techniques introduced by the authors leverage these relationships in order to effectively eliminate non-skyline points from the search space. Based on the aforementioned discovery, the authors put forth optimized algorithms that aim to decrease the frequency of dominance tests and the computation of costly metric distance functions during the process.
Range-based skyline queries
The range skyline search is a query type that is oriented towards preferences and combines the characteristics of the conventional range query and the point-based skyline query. Huang et al. [52] introduced the concept of continuous monitoring of the conventional point-based skyline. This approach involves the constant movement of both the query point and the data points along a line at a uniform speed in all dimensions. Their suggested solution avoids having to calculate the skyline from scratch each time. Instead, potential changes are made each time or after new incoming tuples, ensuring that the skyline query result is updated and continuously accessible. The method can be especially helpful in real-time applications like video games and digital war systems, such as when a player in a field-fighting game wants to keep an eye on the enemies who are moving close to them and pose the greatest threat in terms of a number of factors (health, firepower, tactical acumen, and so on). Tao et al. [88] introduced the concept of distance-based representative skyline queries (DRS), aiming to select a subset of skyline points that effectively represent the distribution of the entire set. This approach evaluates the “representativeness” of a chosen set using a distance metric, with the goal of minimizing the maximum distance between non-representative skyline points and their closest representatives. Their method clusters skyline points and selects the center point of each cluster as the representative subset. The proposed DRS query minimizes the representation error, defined as the maximum Euclidean distance between a skyline point and its closest representative. They developed a dynamic programming algorithm to efficiently solve this problem in 2D space, but found it to be NPhard in higher dimensions, necessitating an approximate polynomial-time algorithm with a 2 approximation ratio. Additionally, they proposed the maximum-dominance (MaxDom) method to find top-k dominant representative skyline points, which involves identifying skyline points with the most dominated data points. Their work provides insights into the challenges and strategies for selecting a limited subset of skyline tuples to represent the full skyline effectively.
Fu et al. [89] investigated continuous range-based skyline queries in road networks, modeling the approximate location of moving queries as ranges rather than points. They defined the domination relation between data points based on their distance to the query and monotonic order in available attributes, focusing on the static skyline in spatial networks. To address this, they proposed two algorithms: the landmark-based algorithm (LBA) and the index-based algorithm (IBA), leveraging a weighted network Voronoi diagram and a high-performance multilevel range search query process. These algorithms efficiently retrieve objects within specified regions in the searching area. Their research extended to handle continuous range-based skyline queries over moving objects, introducing incremental versions of LBA and IBA. This enables applications like gas station recommendation in road networks. Fu et al. defined domination relationships based on non-spatial attributes and shortest path distances to query points, considering both static and dynamic scenarios.
Hidayat et al. [90] tackle the challenge of efficiently updating results for moving skyline and topk queries within dynamic environments. This issue is critical because of the increasing use of location-based services enabled by smartphones and GPS technology. The authors introduce safe zone-based techniques aimed at reducing the frequency of updates, which is vital for real-time applications. One of the work’s key strengths [90] is its innovative use of safe zones, which ensures that query results remain stable as long as the query stays within a predefined area, significantly reducing the need for continuous updates. Extensive experimental evaluations demonstrate the efficiency of the developed algorithms, highlighting substantial performance improvements over naive methods. A solid theoretical foundation underpins the methodology, detailing novel optimizations and algorithmic strategies for constructing safe zones for both skyline and topk queries. Additionally, the flexibility and applicability of these techniques are demonstrated through their extension to various distance metrics and multidimensional spaces. The paper’s experimental rigor is a notable strength, with experiments conducted on real datasets comparing the proposed methods against a lower-bound cost (supreme algorithm), demonstrating that the techniques perform near the theoretical optimum. The detailed results strongly support the efficiency and scalability of the methods. The contributions of this paper are significant both academically and practically. It extends the field of spatial queries by applying safe zone-based techniques to moving skyline and topk queries. The development of generic pruning techniques and safe zone construction algorithms applicable to various monotonic scoring functions represents a major advancement in multicriteria decision-making research. Practically, these methods offer developers of location-based services a means to enhance the responsiveness and efficiency of query processing systems, potentially improving user experiences and reducing operational costs. The techniques are particularly valuable for applications requiring high-frequency updates, such as autonomous driving systems, real-time recommendation systems, and location-aware alert systems. The study conducted by Lin et al. [91] was the first study to focus on range-based skyline queries in mobile environments. The authors presented two algorithms, namely ISKY and NSKY. The ISKY algorithm is based on an index. The proposed approach involves the computation of the skyline scope for individual objects based on their dominance relations prior to query processing. Authors utilize the MXCIf quadtree to index the skyline scope, which is independent of the query issuer. By traversing the index tree, a range-based skyline query can be efficiently processed. NSKY is an algorithm that is not indexed and is designed to handle highly dynamic data sets. It achieves this by converting a range-based skyline query into multiple segment-based skyline queries. Their methodology is only relevant to Euclidean space due to the difference between the road network and Euclidean space.
By handling the query as a range in every dimension rather than as a point, Wang et al. [92] were the first to propose an algorithm for processing dynamic skyline queries. A data point can only be in the range skyline given the query range if it is not dynamically dominated by any other data point with respect to every query point in the given range. The method prunes the vast majority of data points that cannot be included in the dynamic skyline using a grid index and a variation of the well-known Zorder curve. The final range skyline set is able to be obtained utilizing a non-index skyline processing technique.
Tzouramanis et al. [93] take into account that a data point can be in the range skyline if it is not dynamically dominated by any other data point in relation to any specific query point in the d-dimensional range and thus is not required to be in relation to all of the query points in the range. The second point is that their suggested solution specifies the sub-region (within the given range) to which a data point may belong in the range skyline set. In addition, their method does not require the implementation of additional procedures or the use of non-index skyline processing algorithms to remove false hits. Instead, it uses a traditional spatial index to prune all the data points that do not belong to the dynamic skyline in relation to the given range.
Fu et al. [89] look into continuous range-based skyline queries (CRSQs) in road networks, and they propose two effective algorithms: landmark-based (LBA) and index-based (IBA). They define the dominance relation between two data points based on both their distance to the query and the monotonic order in any other attribute, and then compute the skyline by modeling the approximate location of the moving query as a range on the road network.
Reverse skyline queries
Dellis and Seeger [94] pioneered the concept of reverse skyline queries within a monochromatic framework, defining it as the retrieval of points whose dynamic skyline includes the query point. This notion is based on dynamic skyline definitions, establishing the reverse skyline query RSL(q, D) for a dimensional dataset D and query point q. Their work introduced algorithms like BBRS and RSSA for reverse skyline processing, often leveraging for skyline determination. However, the MapReduce framework’s limitations hinder the parallelization of such algorithms due to the absence of distributed functionality. To address this, methods such as rskyquadtrees were proposed, introducing midpoints for effective pruning. Dellis and Seeger [94] expanded the concept to various domains, including reverse skyline queries from both customer and manufacturer perspectives. These queries retrieve data objects or products based on dynamic skyline considerations. While their work primarily focuses on precise data, handling uncertain databases poses challenges due to the variable attributes represented by uncertainty regions. Furthermore, Dellis and Seeger explored scenarios where attributes are absolute distances to the query point, proposing the reverse skyline concept based on dynamic skyline sets. They introduced the global skyline as an upper bound for the reverse skyline set and developed a branch-and-bound algorithm to identify reverse skyline points by verifying each candidate against window queries.
Lian and Chen [95] introduced the first algorithm for bichromatic reverse skyline queries, designed specifically for uncertain datasets represented by probability distribution functions (pdf). Their algorithm extends the definition of reverse skyline to accommodate bichromatic scenarios and presents efficient computation techniques. The study looks at probabilistic reverse skyline queries (PRSQ) in both one-color and two-color situations. In these queries, an object’s dynamic skyline contains a query point with a probability threshold. In the monochromatic case, PRSQ identifies objects with dynamic skylines containing the query point, while in the bichromatic case, it returns points from one dataset whose dynamic skyline on another dataset contains the query point. The proposed algorithm uses Rtrees to index uncertain regions of each point, as well as spatial or probabilistic pruning methods to reduce the search space. The final phase computes probabilities for the remaining regions to produce the definitive result. The approach treats uncertain data as hyperrectangles, reflecting the uncertainty inherent in certain attributes, such as price or distance ranges. This modeling allows for a broader range of applications, such as in real estate, where buyers may specify price or distance ranges rather than exact values.
Wu et al. [96] studied bichromatic reverse skyline queries in precise datasets and came up with ways to make Rtree access orders more efficient so that I/O costs were much lower. Their research primarily focuses on centralized scenarios, proposing the bichromatic reverse skyline algorithm (BRS) to address traditional datasets composed of precise points. Their work shows how important it is to optimize the access order in Rtree structures to increase pruning power and make bichromatic reverse skyline retrieval work better.
Lian and Chen [97] investigate efficient and accurate probabilistic reverse skyline query processing on monochromatic and bichromatic data. There is a probability distribution function for each object. The probabilistic reverse furthest skyline (PRFS) was proposed in [97], which is an extension of the work from [95] and takes into account the scenario in which preference minimization is sought-after rather than maximization. Another proposal is a twist on the probabilistic reverse skyline (PRS) query that provides the k most probable objects. The authors also looked into a ranking method for PRS query results called topk reverse skyline, which retrieves the k points with the most dynamic skylines at the highest probabilities.
Using DCtrees as an index and the pruning technique for reducing the search space, Zhu et al. [98] proposed a reverse skyline query algorithm. Sensor node power is extremely valuable in wireless sensor networks (WSNs). In such conditions, the reverse skyline algorithm must not only perform well but also minimize power consumption. Due to the constant flow of new information, it can be challenging to keep an index on a dynamic dataset up-to-date. In this manner, they implemented DCRS, a divide-and-conquer algorithm for reverse skyline retrieval on data streams. The DCRS employs efficient pruning methods to reduce the search space by using the DCTree as the index.
Power-grid transmission lines, for instance, are prone to icing in ice-disaster warning systems under conditions like low wind speed, low temperature, and high humidity. To detect iced or soon-to-be iced transmission lines, the centralized control sends a skyline query of the speed, temperature, and humidity dimensions to sensor networks installed in the power grid’s transmission lines. For the multiple-objective decision analysis of these sensing data, skyline query algorithms are widely adopted as one of the main monitoring methods in safe production monitoring and disaster early-warning applications [1]. In [1], the authors investigate the issue of WSN reverse skyline computation. The reduction of wasteful transmissions is the foundation of their proposed energy-efficient strategy. The authors also made some evaluations regarding range and reverse skyline queries.
Lim devised a parallel Skyline query technique that employs angle partitioning of data sets [99] to achieve excellent load balancing. The algorithm initially projects the input into a hypersphere and subsequently employs the angle coordinates to perform the meshing process.
Banaei Kashani investigated the SSP algorithm [100], a distributed version of the Skyline query algorithm used in the BATON overlay network. The algorithm arranges the baton network’s nodes in a balanced binary tree. There is a range of data objects that each node is responsible for processing. The SSP algorithm maps multi-dimensional data to a dimensional order address using the Z-order address.
Tables 4, 5 and 6 summarize several developed skyline queries and their characteristics over complete and incomplete datasets.
Table 4. An example of approaches to sequential continuous skyline techniques over a complete data stream
Approach year | Computation Space | Data type | Technique | Query Type | Processing Type | Window type | Data partitioning |
|---|---|---|---|---|---|---|---|
Lazy, Eager 2006 [8] | Partial | Complete | – | skyline | Centralized | Both | – |
LookOut 2007 [51] | – | Complete | – | skyline | Centralized | Time-based | – |
MSQ 2008 [101] | – | Complete | metric index | metric skyline | Centralized | – | – |
FSQW 2009 [102] | – | Complete | – | Frequent skyline | Centralized | Time-based | Angle-based |
BOCS 2010 [24] | Partial | Complete | Index R-tree | skyline | Distributed | time-based | horizontally split data |
mMR-SUDS 2013 [103] | – | Complete | – | – | Cloud | Time-based | – |
skyline group 2015 [104] | – | Complete | – | group skyline | Centralized | Time based | – |
DSJQ 2016 [70] | – | Complete | – | Skyline join | Distributed | – | – |
k-LDS, PBA, -greedy 2016 [105] | – | Complete | – | k Representative | Centralized | Count based | – |
LookOut, Lazy, Eager 2016 [106] | – | Complete | – | skyline | Centralized | Both | – |
Gsky 2018 [107] | Partial | Complete | Clustering | top-k,multi-Query | Centralized | Count-Time | grid-based |
BJR-tree, NDcache 2019 [108] | – | Complete | Index BJR-tree | skyline | Centralized | - | Object-based |
NSCt framework 2020 [15] | – | Complete | – | Negative-skyline | Centralized | Both | – |
IMSS, OIMSS, PMSS 2020 [109] | – | Complete | – | Mutual skyline | Centralized | – | – |
Dynamic k-dominant Skyline 2021 [110] | – | Complete | Clustering | k-dominant | Centralized | – | – |
Table 5. Examples of approaches to sequential continuous skyline techniques over Incomplete and Uncertain data streams
Approach year | Computation space | Data type | Technique | Query type | Processing type | Window type | Data partitioning |
|---|---|---|---|---|---|---|---|
Replacement, Bucket, Iskyline 2008 [20] | Full space | Incomplete | Clustering Non-Index | Skyline | Centralized | Count-Time | - |
OURS 2009 [111] | – | Uncertain | – | Probabilistic Skyline | Centralized | – | – |
SSKY, MSKY, QSKY 2009 [112] | – | Uncertain | – | q-skyline | Centralized | Both | – |
DCRS 2009 [98] | – | Uncertain | DC-tree index | Reverse skyline | Distributed | – | – |
TDG algorithm 2010 [113] | – | Uncertain | Clustering | Probabilistic | Centralized | Time based | – |
RBSSQ 2016 [114] | Full space | Incomplete | Clustering Index | Skyline | Centralized | – | – |
SIDS 2013 [115] | Full space | Incomplete | Sorting Index | Skyline | Centralized | – | – |
SPQ 2017 [116] | Partial | Incomplete | Sorting Index | Skyline | Distributed | – | – |
ESB, UBB, BIG, IBIG 2015 [117] | Full space | Incomplete | Clustering Index | Top–k | Centralized | – | – |
pnN, pmnN 2015 [118] | – | Uncertain | – | n-of-N Skyline | Centralized | Time based | – |
Baseline, DAG, BIB 2016 [119] | Full space | Incomplete | Clustering Index | K-dominant | Centralized | – | – |
Uncertain reverse skyline 2017 [120] | – | Uncertain | R-tree Index midpoint based | Bichromatic Reverse | Distributed | – | Angle-based |
GSS+ 2018 [83] | – | Incomplete | - | Spatial skyline | Centralized | – | – |
PFSIDS 2021 [121] | Partial | Incomplete | Clustering | top-k,multi-Query | Centralized | Count-Time | Angle-based |
CrowdSJ, PSJCrowd, ASJCrowd 2021 [122] | – | Incomplete | - Index | Skyline join | Distributed | – | – |
Table 6. Example of approaches to parallel continuous skyline techniques
Approach year | Data type | Technique | Skyline Variant | Processing Type | Window Variant | Data partitioning |
|---|---|---|---|---|---|---|
CMS, AMS, DMS, APS 2013 [123] | Uncertain | – | Probabilistic | Cloud | – | – |
SPM, APM, DPM 2014 [124] | Uncertain | – | skyline | Cloud | count based | – |
DPF, PSS 2014 [33] | Uncertain | grid index | skyline | Centralized | – | – |
JRtree skyline 2015 [125] | complete | sorting | skyline join | Distributed | – | – |
APSS,CPSS,CUDA-Based PSS 2015 [126] | Complete | – | Probabilistic skyline | Centralized | – | – |
skyline Manhattan distance 2015 [127] | Complete | – | – | Centralized | – | – |
parallel eager 2016 [5] | Complete | – | skyline | Centralized | Both | – |
tree 2016 [128] | Uncertain | Quad Tree Indexing | Reverse | Distributed | – | – |
HashSkylineGPU 2017 [39] | complete | Clustering | hash skyline | Centralized | - | - |
uncertain reverse skyline 2017 [120] | Uncertain | R-tree Index midpoint based | bichromatic Reverse | Distributed | - | Angle-based |
parallel uncertain n-of-N 2018 [129] | Uncertain | – | n-of-N skyline | Centralized | – | – |
NP-SWJ, IP-SWJ 2018 [9] | complete | Clustering | skyline join | Distributed | – | – |
parallel uncertain n-of-N sky 2019 [130] | Uncertain | – | n-of-N skyline | Centralized | – | – |
PKDS | Uncertain 2019 [131] | Capability Index | k-dominant | Centralized | – | – |
pruning+monitoring reverse sky 2019 [99] | Uncertain | - | Reverse | Distributed | - | Angle-based |
dySky 2021 [132] | Complete | – | skyline | Centralized | – | – |
Basic, Parallel_Basic, Parallel_PS 2021 [133] | Complete | – | skyline | Centralized | – | – |
Problem definition
Within this section, we will present the key definition and theorem essential for understanding the subsequent discussions.
Dimension indexing
The RSS (Range Search Skyline) method relies on the indexing of dataset dimensions based on the total order of each dimension. This enables obtaining the skyline without the need for dominance comparisons involving the entire dataset or the entire current skyline [11]. While determining skyline tuples, notice that the name Range Search represents the bounded search range [11].
To find the skyline of the dataset in Table 3 using a basic skyline technique, we would compare smartphones based on each feature. A smartphone is considered in the skyline if it is not worse than any other smartphone in all features and strictly better in at least one feature. Let’s consider the comparison:
we’ll use the dominance relationship. A Phone
The Final Skyline:
Definition 4
(Dimension Index [11]) Let be an -dimensional( attributes) continuous dataset. The dimension index A of is constructed by the set of sorted collections of index entries. such that each index entry represents a single tuple, that includes a header pointer that points to the header of the tuple, the value at this dimension and a dimension pointer that points to the dimension value in the next dimension of the same tuple. All index entries of the same dimension , , are sorted by value with the preference order .
Figure 3 demonstrates the data structure of the dimension index, all index entries of the same tuple are linked on all dimensions . the node represents the header of the tuple . for example, represents the value of a tuple on the dimension 2. This data structure of linked index entries permits quick tuple browsing from any dimension.
[See PDF for image]
Fig. 3
Dimension index system
Theorem 1
(conditions under which a tuple is classified as a skyline tuple [11]) Given A the attribute(dimension) index of an dimensional continuous dataset , let be a random sub-index of A, and x be a tuple. Therefore, x is a skyline tuple iff where and one of the conditions below is respected: (1) where , or (2) where
Figure 4 illustrates the dimension indexing of RSS algorithm for multidimensional dataset in Table 3 that includes 6 sub-indexes and 10 tuples in all. The dashed lines represents dimension links between index entries of the same tuple.
[See PDF for image]
Fig. 4
Dimension index for multidimensional dataset in Table 3
Based on Theorem 1, we can determine from and independently that is a skyline tuple. From dimension , we obtain both and there is no tuple in this dimension that is better than and , for which both and must be checked to find if and are skyline tuples (actually we have and ).
If we use Theorem 1 only on dimension , is directly a skyline tuple, additionally we have , so is not a skyline tuple; if we continue down, the next entry is not skyline tuple none of them until we reach , where , therefore is a skyline tuple. Finally, with dimension only, 3 consecutive entries include the dimension value 550 so these 3 tuples have to be first locally compared to extract local skyline tuples, in this example, , then we apply Theorem 1. this skyline tuples locally identified in dimensions and are called the local skyline. clearly, each distinct dimension value is a local skyline.
To maintain the correctness of the skyline when new incoming tuples are received based on Theorem 1 Liu et al. [11] introduced a technique to update the dimension index, as illustrated in Fig. 5, where they assumed that all values at each dimension are distinct. To keep the example simple, the symbol represents the rest of the index entries (values). Let’s assume an incoming tuple enters our dataset, We start by locating the index entry positions of on all dimensions with reference to the preference order (as the blue star pattern entries). Next we identify a lower-bounded dimension that includes the lowest number of skyline tuples (lower bounded skyline) provided that . Then we use theorem 1 to compare the incoming tuple with these lower-bounded skyline tuples and check if is a skyline tuple. As illustrated in Fig. 5 comparing the incoming tuple to the lower-bounded skyline in dimension , where and are incomparable, making a skyline tuple. Theorem 1 can be used to filter any existing skyline tuples that could be dominated by the incoming tuple .
[See PDF for image]
Fig. 5
Example of dimension indexing
If is a skyline tuple, we identify the upper-bounded dimension of that contains the lowest number of skyline tuples where . If for every tuple in the upper-bounded skyline, later will be cleared out of the skyline, else There is no need for upper-bounded skyline identification. In Fig. 5, skyline tuples and are dominated by , so they will be cleared out of the skyline.
To identify the upper-bounded and lower-bounded dimensions for an incoming tuple , Liu et al. [11] used the following estimation equation:: dimensional value of the incoming tuple x on dimension i. Here When the number of dimensions (attributes) increases, parallelizing the building of the dimension index and the identification of upper-bounded and lower-bounded dimensions in addition to the parallel insertion and removal of tuples will boost the whole skyline computation’s performance.
Parallel RSS over data stream
Parallel range search skyline PRSS
This section outlines the process of developing a parallel version of the sequential RSS algorithm, which we have named parallel RSS (PRSS). Within PRSS, the arrival of new tuples triggers the incremental update of the skyline and the removal of expired tuples. PRSS updates the skyline incrementally by adding each incoming tuple when the window is not yet full. The algorithm efficiently removes expired tuples, changes dominance relationships, and adjusts the skyline for filled windows. PRSS utilizes dominance lists to keep track of tuples that are directly dominated, enabling efficient updates when tuples are added or removed.
PRSS examines the collection of expired skyline tuples to determine if any dominated tuples could potentially be recent skyline tuples. Subsequently, eliminate the tuples that have reached their expiration date and incorporate the newly inputted tuples into the window. This algorithm 1, referred to as PRSS, discusses the concept of PRSS. The inputs for this process include a data stream T, a window size Z, and a dimensionality N (representing the number of attributes), the start and end of the chunk given for each processor core. As the window Z moves over the data stream T, the algorithm maintains the updated skyline index S after it receives each tuple x.
PRSS begins by initializing two indices: Dimension Index : An empty index used to manage the tuples based on their dimensions (Used to store the index entries for each dimension). Skyline Index Skyline Index : An empty set that will store and keep track of the current skyline tuples. PRSS continuously processes tuples from the data stream T. For each new tuple x read from the stream, the following steps are performed:
The technique of PRSS begins by removing expired tuples from the dimension index A (lines 5–17). The function PRSS retrieves the set V of all expired tuples based on the type of the window Z (either count-based or time-based) at line 4. The set E consists of all tuples that are directly dominated by each expired skyline tuple (line 7). This identification is done using the parallelDominatedTuples method. Subsequently, the variable v is deactivated in A to guarantee that subsequent dominance tests remain unaffected (Mark v as inactive in the dimension index A). Disabling and Removing Tuples will ensure the integrity of the index and skyline by properly handling the expired tuples and maintaining an accurate skyline set. The PRSS function utilizes the parallelRangeSearch algorithm (Algorithm 2) to identify additional skyline tuples for each tuple (lines 9–13). If the element v belongs to the set S, the PRSS algorithm removes v from the skyline index S at line 14. In order to stop the deletion process of the tuple, PRSS permanently deletes v from the set A at line 16. Following that, PRSS initiates the process of inserting incoming tuples into A (lines 18–25). The PRSS function is called on line 18 to determine whether x is a skyline tuple using the PRangeSearch algorithm. If the incoming tuple x is a skyline, the PRSS algorithm calls the parallelDominatedTuples function to retrieve the tuples R that are directly dominated by x. It then proceeds to simultaneously remove each tuple in parallel, as indicated in lines 19-22. Ultimately, the PRSS appends the value x to the set S at line 23 and concurrently inserts x into the set A at line 25. The attributes of an incoming tuple x will be distributed among the CPU cores (workers), as shown in Fig. 6. If PRSS is unable to retrieve any expired tuples, it will promptly proceed with the insertion of new tuples.
[See PDF for image]
Fig. 6
Load balancing on cores
The algorithm determines whether a given tuple x is a skyline tuple by checking against the current dimension index A and updating the skyline index S as necessary. The algorithm begins by initializing the skyline index S.
If x passes the initial dominance check, the algorithm retrieves the skyline tuples S for the lower bounded dimension. It iterates over each skyline tuple s in S. If any of these skyline tuples dominate x, the algorithm returns false. Lower Bounded Range Search: It identifies the lower bounded dimension for the tuple x using the LowerBoundedDimension function. This dimension helps in locating the relevant blocks of tuples that could potentially dominate x. Determine the lower bounded dimension “lower”: this function identifies the dimension with the least value where x is bounded.
Summary of PRSS steps: 1. Initialization: Set up empty indices for dimensions and skylines. 2. Process Each Tuple in Data Stream: Identify and handle expired tuples, If an expired tuple is in the skyline, update the skyline by re-evaluating dominated tuples. Insert the incoming tuple, Determine if it should be part of the skyline. Update the dimension index and skyline accordingly.
Key Functions: : Identifies tuples that have expired based on the window W. DominatedTuples(A, v): Finds tuples directly dominated by a given tuple. PRangeSearch(A, x): Checks if a tuple is a skyline tuple and updates necessary indices.
The method is demonstrated in Algorithm 2, which concurrently determines the upper and lower bounds of the skyline for an incoming tuple x in the dimension index A. Upon the arrival of a new tuple x in our datasets, the PRangeSearch algorithm initiates by determining the dimension with the lowest lower bound for x (as outlined in Algorithm 3) in order to perform dominance tests (line 2).
According to Theorem 1, PRangeSearch employs the embedded BNL algorithm to compare x with all tuples B that share the same value at the same dimension (line 3). The algorithm retrieves the block of tuples B from the dimension index A corresponding to this lower bounded dimension. () The block is a subset of tuples in the dimension index A that are relevant for comparison with x in the lower dimension.
If x is determined to be a local skyline, the function PRangeSearch will return true. Otherwise, the function will stop executing (lines 4–6). uses a Block Nested Loop (BNL) method to check if any tuple in block B dominates x. If any tuple in B dominates x, the algorithm returns false, indicating that x is not a skyline tuple. BNL checks if x can be a skyline tuple within the block B. If BNL returns false, x is not a skyline tuple, and the function exits early.
If x is a local skyline, the PRangeSearch algorithm also concurrently identifies the lower-bounded skyline S for the tuple x (line 7), and subsequently compares S with x to determine if x is a skyline. If the condition is not met, the PRangeSearch function terminates immediately (lines 8–12), and returns false.
If x is identified as a skyline tuple, the PRangeSearch identifies the upper bounded dimension for x (line 13) using the UpperBoundedDimension(A, x) function in order to eliminate skyline tuples that are ultimately dominated by x. This helps in locating another set of relevant blocks of tuples.
The skyline tuples b that have the same dimension value as x (lines 14–20) need to be removed from the skyline index. PRangeSearch Fetches the block B associated with the upper bounded dimension: It retrieves the block of tuples B from the dimension index A corresponding to this upper bounded dimension. the block is a subset of tuples in the dimension index A that are relevant for comparison with x in the upper dimension. Check each tuple b in block B: For each tuple b in block B, if b is part of the current skyline and x dominates b, the algorithm updates the dominance list to reflect that x now dominates b. It then removes b from the skyline index S. The algorithm retrieves the skyline tuples S for the upper bounded dimension. It iterates over each skyline tuple s in S. If x dominates any of these skyline tuples, it updates the dominance list UpdateDominanceList(x, b) accordingly and removes the dominated skyline tuples from S. for each tuple b in B, if b is in the skyline index S and x dominates b (), update the dominance list for x and remove b from S.
The PRangeSearch algorithm identifies the upper-bounded skyline S of x (line 21) and removes any skyline tuple that is dominated by x (lines 22–27). If x is a skyline, the function PRangeSearch will return a value of true. Retrieve the skyline tuples in the upper bounded dimension: this function gets the set of skyline tuples that are relevant to the upper bounded dimension upper. Check dominance against tuples in S: For each skyline tuple s in S, if x dominates s (), update the dominance list for x and remove s from S.
PRangeSearch algorithm returns true indicating that x is a skyline tuple: If x passes all these checks without being dominated by any other tuple, and after updating the skyline index S as necessary, the algorithm concludes that x is a skyline tuple and returns true.
in summary PRangeSearch algorithm does Lower Bounded Range Search where it identifies the lower bounded dimension. Fetches relevant block and checks dominance using BNL. Compares against current skyline tuples and updates if necessary. Then does Upper Bounded Range Search where it identifies the upper bounded dimension. Fetches relevant block and compares against current skyline tuples. Updates dominance lists and skyline tuples as necessary. Finally Returns True: Concludes that x is a skyline tuple if all checks pass.
Key Functions: LowerBoundedDimension(A, x) / UpperBoundedDimension(A, x): Determines the lower/upper dimension bounds for x. GetBlock(x, A): Retrieves the relevant block of tuples for comparison. BNL(B, x): Checks if x can be a skyline tuple within the block B. LowerBoundedSkyline(A, x) / UpperBoundedSkyline(A, x): Retrieves the skyline tuples in the lower/upper bounded dimension. The UpdateDominanceList() function, as shown in Algorithm 2, is responsible for updating the dominance indexes when the incoming tuple dominates the existing skyline tuples. When the function UpdateDominanceList(x, b) is called, the dominance index of b is appended to the dominance index of x.
By systematically checking both lower and upper bounded dimensions, PRangeSearch algorithm ensures efficient skyline tuple maintenance by narrowing down the range of dominance checks and updating the skyline index continuously.
[See PDF for image]
Algorithm 1
PRSS (Parallel Range Search for Stream)
[See PDF for image]
Algorithm 2
Parallel RangeSearch
Optimization of the dominate() Function using AVX2
The dominate() function Indicates whether the first tuple dominates the second tuple. It Iterates through each element (dimension) of the tuples. Compares the corresponding elements of the two tuples: If the element in the first tuple (“*p1”) is greater than the element in the second tuple (“*p2”), the first tuple does not dominate, and “false” is returned. If the element in the first tuple is less than the element in the second tuple and dominance hasn’t been established (“!dominating”), it sets “dominating” to “true”. If the iteration completes without returning “false”, it means the first tuple dominates the second tuple if “dominating” is “true”. Returns the result.
Koizumi et al. [125] used Intel SSE/AVX instructions to decrease the number of instructions in the dominates() function. In our case we used the AVX2 which is an abbreviation for Advanced Vector Extensions 2, which serves as an extension to the x86 instruction set architecture. AVX2 is specifically engineered to enhance the speed of vectorized calculations, especially when it comes to SIMD (Single Instruction, Multiple Data) operations. AVX2 introduces vector registers that are 256 bits wide, which is double the width of the registers in SSE and AVX. This enables the concurrent processing of a greater number of data elements within a single instruction [134]. AVX2 is a component of a wider range of improvements in SIMD technology that focuses on utilizing parallelism in contemporary processors. SIMD instructions enable the execution of a single instruction on multiple data elements simultaneously, resulting in notable performance enhancements for specific computational tasks [125].
The function is an AVX2-optimized version of the original “dominate” function. It leverages AVX2 intrinsics to perform SIMD operations, processing multiple elements in parallel. Compares corresponding elements of two tuples using AVX2 vectorized instructions. Takes advantage of AVX2’s ability to execute multiple comparisons in a single instruction. Generates a mask indicating the dominance relationship for each element. Combines individual element comparisons using AVX2 intrinsics. This AVX2-optimized function enhances the efficiency of dominance checking in the context of skyline computations, leveraging advanced vectorized instructions for improved parallelism.
[See PDF for image]
Algorithm 3
dominate()
The time complexity of the following AVX2-optimized code is O(m/8), where ’m’ is the width of the vectors being compared. The implementation harnesses the power of Intel AVX2 instructions, which enable Single Instruction, Multiple Data (SIMD) operations for 256-bit operands. In the context of AVX2, the function initializes a 256bit AVX register () with all elements set to zero. This register, named dominating, is used to accumulate comparison results during the vectorized processing of the input vectors. The code iterates over the vectors in chunks of eight elements at a time, leveraging the 256-bit wide AVX2 registers. For each iteration, two 256-bit vectors (vec1 and vec2) are loaded from memory using the intrinsic. This enables simultaneous loading of eight double-precision floating-point values. The intrinsic is then employed to compare corresponding elements of vec1 and vec2. The comparison mode checks if the values in vec1 are greater than those in vec2. The resulting vector () holds the comparison outcomes.
To determine if any dominance exists in the current set of eight elements, a logical OR operation is performed between the ’dominating’ register and . This update ensures that if any element in the current chunk indicates dominance, it will be reflected in the ’dominating’ register. Following the vectorized comparison, the code checks if any element in the ’dominating’ register is non-zero using the intrinsic. If true, it means that no dominance was found in the current chunk, and the loop continues to the next set of eight elements. If false, the function returns false as dominance is detected. Finally, the function returns true if any element in the ’dominating’ register is non-zero after processing all chunks, indicating the presence of dominance in the entire vector. This AVX2-optimized implementation significantly improves performance by exploiting parallelism in the comparison operations, leading to reduced instruction count and enhanced efficiency in processing large datasets.
[See PDF for image]
Algorithm 4
Main Algorithm for Computing Skylines in Parallel
Parallel implementation details
Utilizing multi-core platforms can effectively reduce the execution time for continuous Skyline computation. We focus on studying optimization techniques that are specifically related to the distribution of computation across multi-core processors [127]. We examine the impact of various load-balancing strategies. In our proposed implementation, the workloads are distributed among the cores. Once a core completes its computation, it takes on a portion of the remaining tasks. Consequently, an effective load balancing is ensured. Furthermore, based on the number of cores that are available, we use two distinct solutions for loop iterations scheduling among threads.
Static Scheduling: iterations of the loop are divided into chunks at compile time. Each thread is assigned a fixed-size chunk of iterations to process before the loop begins. The chunk size remains constant throughout the execution of the loop. Static scheduling is often used when the workload per iteration is known to be uniform, and the overhead of chunk distribution is negligible compared to the overall computation.
Dynamic Scheduling: iterations of the loop are distributed dynamically among the available threads at runtime. Threads request work (iterations) dynamically from a shared pool of iterations until all iterations are completed. The chunk size may vary dynamically based on the workload of each thread and the availability of iterations in the pool. Dynamic scheduling can be beneficial when the workload per iteration is unpredictable or when the workload varies significantly across iterations.
Performance evaluation
Experimental setup
We assess the performance of our PRSS system using both count and time-based windows on synthetic and real data streams. The PRSS and RSS implementations are written in C++ and compiled using g++ (v11.3.0) with the -O3 optimization flag. The OpenMP API (v5.0) library is employed for implementing multi-threading algorithms.The simulations were executed on an Intel i7 4.2 GHz processor, which has 6 cores, along with 16 GB of DDR4 RAM. The operating system used was Windows 10. We assess the scalability and runtime efficiency of our algorithm by conducting a comprehensive experimental simulation, comparing it to the sequential RSS algorithm. The PRSS c++ source code has been published on GitHub as well.1 In order to compare our parallel algorithm with its sequential counterpart RSS, we employed the original implementation developed by the original author.2
Datasets:We employed the Skyline benchmarking data generator.3 The parameters utilized in the current skyline techniques are employed to directly compare the outcomes. As an illustration, we vary the dimensionality d, ranging from 8 to 64, and the cardinality, varying from n = 100k to 500k. In addition, we conduct benchmarking using real datasets, specifically the Covertype4 and weather5 datasets.
Covertype: The Covertype dataset contains data on cartographic variables, including elevation, proximity to the nearest roadway, and slope. The data is accessible for grid cells with dimensions of 30 m × 30 m in the Roosevelt National Forest located in Colorado, USA. Skyline points represent forested areas that have unique cartographic characteristics.
Weather: The Weather dataset offers monthly data on precipitation, along with the geographical coordinates (latitude and longitude) and elevation for 566,268 terrestrial locations across the globe. Each record corresponds to a cell that measures 10 degrees of latitude and longitude. A skyline record refers to a distinct pattern of months that experience unusually high levels of rainfall based on its three-dimensional location, with a preference for higher and northeastern areas.
Experimental results
Runtime efficiency comparison
Initially, a comparison is made between the run-time efficiency of PRSS and the sequential RSS method. The comparison results of the algorithms are displayed in Figures. Refer to Figs. 7, 8. The lines represent the quantity of attributes present in the datasets, while the bars indicate the duration of the running times, measured in seconds. The experiment utilized sliding window sizes ranging from 100k to 500k tuples for the count-based window, and from 8 to 64 for the dimensionality.
Effect of data dimensionality and cardinality : In Figs. 7 and 8 We systematically vary the cardinality from 50,000 to 500,000, with a high range of values. The algorithm’s execution times increase proportionally with . As anticipated, Parallel RSS demonstrates a notable improvement in speed when the dimensionality and cardinality of synthetic datasets are increased, as shown in Figs. 7 and 8. However, the key observation lies in the rate of this increase. Parallel RSS consistently outperforms its sequential counterpart, showcasing a notable improvement in speed. This improvement is particularly significant when dealing with datasets of higher dimensionality, as demonstrated by the steeper slopes in the execution time curves of PRSS. The parallelization strategy implemented in PRSS efficiently handles the increased computational load induced by higher dimensionality and cardinality, resulting in superior scalability.
Effect of real dataset: In order to demonstrate the efficiency of Skyline queries, it is necessary to evaluate them using real datasets that may contain duplicate values. In Table 7, using real data for weather condition monitoring with 15 attributes, our parallel RSS algorithm continues to demonstrate superior performance compared to the sequential RSS algorithm. Hence, our parallel version exhibits superiority not only on synthetic dataset but also on real-world data. In real-world scenarios where datasets may contain duplicate values and exhibit more complex structures, the parallel version of the algorithm maintains its efficiency. The ability of PRSS to handle real-world datasets underscores its practical applicability and reinforces the significance of the parallelization strategy employed.
Effect of number of threads: In Fig. 9, We showcase the capacity of PRSS to effectively utilize the complete parallel processing capabilities of the CPU. This is achieved by executing the parallel version of PRSS on a dataset with a fixed window size and dimensionality, while varying the number of CPU threads employed from 1 to 12. Observing a fixed dataset window size (100k) and dimensionality (8), it is evident that the execution time of PRSS decreases almost linearly with an increase in the number of threads. The decrease in execution time is nearly linear as the number of threads increases from 1 to 12, highlighting the efficiency of parallelization. This behavior indicates that PRSS can efficiently distribute the workload across multiple threads, minimizing idle time and maximizing CPU utilization. The scalability of PRSS is crucial in scenarios where computational resources can be dynamically allocated, providing a significant advantage over the sequential approach, especially in modern multicore architectures. This demonstrates the effective utilization of the CPU’s parallel processing capabilities by PRSS.
Consistency Across Diverse Datasets: One noteworthy aspect of PRSS’s performance is its consistency across diverse datasets. Whether dealing with synthetic datasets of varying dimensionality and cardinality or real-world datasets with complex structures, PRSS consistently outperforms its sequential counterpart. This consistency emphasizes the robustness of the parallelization strategy employed in PRSS, showcasing its adaptability to different data characteristics and scenarios.
In conclusion, the experimental results and expanded explanations collectively provide a comprehensive understanding of the effectiveness and advantages of the parallel skyline algorithm (PRSS) over its sequential counterpart. The consistent performance across diverse datasets, scalability with the number of threads, and applicability to real-world scenarios highlight the robustness and practicality of the proposed parallelization strategy (Figs. 7, 8, 9).
[See PDF for image]
Fig. 7
RSS vs PRSS with Anticorrelated, correlated, Independent count based window and differents Cardinality
[See PDF for image]
Fig. 8
RSS vs PRSS with Anticorrelated, correlated, Independent count based window and differents Dimensionality
Table 7. RSS vs Parallel RSS VS BskyTree with real world dataset
Data name | Window size | Dimensions number | RSS Time(sec) | PRSS Time(sec) | BskyTree time(sec) |
|---|---|---|---|---|---|
covertype | 100000 | 10 | 62.9839 | 41.3117 | 3.978 |
weather | 100000 | 15 | 54.6236 | 21.2429 | 140.563 |
[See PDF for image]
Fig. 9
Parallel PRSS Scalability with Anticorrelated count based window and differents Threads
Memory consumption comparison
It’s important to understand that efficiency in parallel computing is not solely determined by memory consumption. While higher memory usage may raise concerns, the efficiency of parallelization should be evaluated based on various factors, including speedup, scalability, and resource utilization.
Here are some points to consider:
Speedup: based on our Runtime Efficiency Comparison our parallel algorithm demonstrates significant speedup compared to the sequential version, it indicates that our parallelization approach is effective in utilizing multiple processing units. This speedup can lead to faster overall execution times, which is a primary goal of parallel computing.
Scalability: A well-designed parallel algorithm should scale efficiently with an increasing number of threads. Scalability refers to the ability of the parallel algorithm to maintain or improve performance as the system size grows. our parallel algorithm exhibits good scalability, thus the parallelization approach is well-suited for larger problem sizes.
Resource Utilization: While higher memory consumption may raise concerns, it’s essential to assess whether the memory usage is justified by the performance gains achieved through parallelization. Efficient utilization of available resources, including CPU cores and memory, is a key aspect of parallel computing. our parallel algorithm effectively leverages resources to achieve better performance, thus the higher memory usage is acceptable.
Trade-offs: Parallel computing often involves trade-offs between factors such as memory consumption, speedup, and scalability. It’s essential to consider these trade-offs in the context of our specific application requirements and system constraints. The increase in memory usage for PRSS algorithm has a reasonable trade-off for significant performance improvements.
Tuple Memory Requirements: Each d-dimensional tuple requires 24×d bytes to store all index entries, including values, header links, and dimension links. Additionally, a header for each tuple requires 20 bytes to store a tuple ID, a dominance list pointer, and a skyline flag.
Space Complexity of B-tree: The space complexity of the B-tree is O(N), where N is the number of nodes. Assuming 2 64-bit pointers per node to maintain the tree, the total memory required by the dynamic dimension index is calculated as 16 × N × (24 × d + 20) bytes.
Memory Requirement Example: The analysis provides an example stating that 100 MB of memory space is sufficient to handle a 10 K window of 20-dimensional streaming data.
[See PDF for image]
Fig. 10
Memory Consumption RSS vs PRSS with Anticorrelated, correlated, Independent with differents Dimensionality
Our parallel PRSS algorithm consumes more memory than the sequential RSS while running faster can be attributed to several factors:
Parallel Overhead: When we parallelize The RSS algorithm using OpenMP, it creates multiple threads to execute computation in parallel. Each thread requires its own stack space, which adds to the overall memory consumption. Additionally, OpenMP might allocate additional resources for thread management and synchronization, contributing to increased memory usage.
Data Duplication: In parallel computing, some data needs to be duplicated or partitioned across threads to ensure each thread has access to the required data independently. This replication or partitioning can lead to higher memory consumption compared to the sequential version, where data might be processed in-place.
Parallelization Overheads:
Each thread running the sequential RSS algorithm on a chunk of data will incur memory overhead for thread stack space and local variables used within the thread.
The number of threads spawned in parallel and the size of the data chunks processed by each thread will determine the overall memory consumption during parallel execution.
Combining Global Results:
After processing chunks of data in parallel, PRSS combines the global results outside the parallel region. This step requires additional memory for storing the aggregated results before the final skyline computation.
Comparison with BSkyTree
To assess the performance of our parallel algorithm, we compare it (i) to a baseline approach which computes the skyline using state of the art algorithm BSkyTree [30]. The goal of this comparison is to show that (i) without any index structure, the best skyline algorithm known so far is unable to handle multidimensional skyline queries when the dimensionality is moderately large in a streaming context on both synthetic datasets and real world datasets.
First, the effects of (1) dimensionality and (2) cardinality of data have been evaluated. For (1), the cardinality is fixed to 100K and for (2), the dimensionality of data is fixed to 16.
PRSS and RSS outperform BSkyTree in high dimensionality domains on AC and Corr data and Independent data but is less efficient than BSkyTree in low dimensionalities. PRSS systematically outperforms RSS and BSkyTree on all data. Figure 11 shows the effect of cardinality on PRSS with the highest dimensionality (d = 16) in our experiments. In high-dimensionality domains, PRSS outperforms RSS and BSkyTree in most cases (Fig. 12).
[See PDF for image]
Fig. 11
RSS vs PRSS vs BskyTree with Anticorrelated, correlated, Independent count based window and differents Cardinality
[See PDF for image]
Fig. 12
RSS vs PRSS vs BskyTree with Anticorrelated, correlated, Independent count based window and differents Dimensionality
We point out two observations from this experiment:
PRSS is faster with more than one order of magnitude in most cases where .
state of the art algorithm BSkyTree algorithm takes a lot of time to extract skyline when dimensionality thus it is unefficient in high dimensionality domains.
Conclusion
In this study, we carefully looked at how to parallelize a well-known sequential Skyline algorithm on a multicore processor, with the goal of making it more scalable and improving its performance. Our work introduces the Parallel Range Search Skyline (PRSS) algorithm, a highly efficient method designed to extract skylines from data streams. To optimize dominance tests within the sliding window, we have used a technique known as dynamic dimension indexing. We tested our PRSS algorithm in a number of different situations, using count-based sliding windows on both synthetic and real data streams that have a lot of different dimensions. The main reason for our study is to see how well parallelization works in multicore architectures. The fact that our method is faster than the sequential version and than state of the art algorithm BSkyTree shows how useful it is in real life. Addressing the inherent complexities of real-world data streams, we are committed to exploring future avenues, particularly the feasibility of implementing PRSS on GPU and FPGA systems. This research direction is geared towards scenarios involving incomplete and uncertain streaming data, thereby expanding the applicability of our algorithm.
Author contributions
PhD student Khames Walid designed the algorithm, implemented the program, analyzed the results, and prepared the manuscript. HadjAli Allel and Lagha Mohand as PhD supervisors reviewed the results and contributed to the manuscript corrections.
Funding
Our research was entirely self-funded, and we did not receive any financial support or funding from external parties.
Data availability
synthetic data: the standard skyline data generator. real data: public repository under creative commons license. We have also uploaded our PRSS algorithm C++ source code on GitHub.
Declarations
Conflict of interest
We declare that the authors have no Conflict of interest as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
for PRSS Algorithm Development and Validation Using Synthetic and Open-Source Real-World Data. Research Objective: The objective of this study is to develop and validate a Skyline algorithm for processing streaming data. The research does not involve human or animal participants or the use of sensitive data. Instead, the algorithm was developed and validated using both synthetic data generated from the open-source skyline data generator software (http://pgfoundry.org/projects/randdataset) and real-world data available in a public repository under the Creative Commons license. Study Design: 1. Algorithm Development: The PRSS algorithm was developed using standard programming practices and techniques. The research team worked collaboratively to design, implement, and optimize the algorithm for processing streaming data. 2. Synthetic Data Generation: Synthetic data is generated using publicly available open-source software called the Standard Skyline Data Generator, which allows the creation of random data sets. This approach eliminates the need for any data collection involving human or animal participants. 3. Real-World Data Acquisition: The real-world data used for validation will be obtained from a public data repository with an open Creative Commons license. As such, no individual or private information will be involved in the study. Ethical Considerations: 1. Privacy and Data Usage: As there are no human or animal participants involved and the synthetic data is generated using open-source software, privacy concerns are not applicable in this research. 2. Compliance with Licensing Terms: The research team ensures full compliance with the licensing terms and conditions for using the real-world data available under the Creative Commons license. 4. Integrity and Transparency: The research team commits to conducting the study with integrity, transparency, and adherence to established scientific practices. Ethical Approval Process: 1. PhD student Khames Walid and the supervisors Allel Hadjali and Mohand Lagha prepared a comprehensive research proposal outlining the study design, data sources, privacy measures, and ethical considerations involved in the algorithm development and validation process. 2. The proposal was submitted to the ethics committee of the aeronautical institute of Blida for review. Given that the research does not involve human or animal participants or sensitive data, the ethics committee will focus on ensuring compliance with open-source software licensing terms. The Ethics Committee of the Aeronautical Institute of Blida has reviewed and approved our research. Since our study did not involve human participants, animals, or sensitive data, the committee ensured that all ethical principles and guidelines were followed throughout the research process. All authors involved in this research have provided their consent and agreement to publish the findings. 3. Upon approval, the research team proceeds with the algorithm development and validation process using the synthetic and open-source real-world data. 4. Research results are provided to the ethics committee to ensure continued adherence to ethical guidelines throughout the research.
https://github.com/avionicscode/multicore-prss.
2https://github.com/skyline-sdi/sdi-rss.
3http://pgfoundry.org/projects/randdataset.
4https://doi.org/10.24432/C50K5N.
5https://crudata.uea.ac.uk/cru/data//hrg/tmc/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Wang, G; Xin, J; Chen, L; Liu, Y. Energy-efficient reverse skyline query processing over wireless sensor networks. IEEE Trans. Knowl. Data Eng.; 2011; 24,
2. Reddy, P.C., Babu, A.S.: Survey on weather prediction using big data analystics. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–6. IEEE (2017)
3. Chen, J., Lyu, Z., Liu, Y., Huang, J., Zhang, G., Wang, J., Chen, X.: A big data analysis and application platform for civil aircraft health management. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), pp. 404–409. IEEE (2016)
4. Kertiou, I; Benharzallah, S; Kahloul, L; Beggas, M; Euler, R; Laouid, A; Bounceur, A. A dynamic skyline technique for a context-aware selection of the best sensors in an iot architecture. Ad Hoc Netw.; 2018; 81, pp. 183-196. [DOI: https://dx.doi.org/10.1016/j.adhoc.2018.08.011]
5. De Matteis, T; Di Girolamo, S; Mencagli, G. Continuous skyline queries on multicore architectures. Concurr. Comput.; 2016; 28,
6. Alami, K., Maabout, S.: Multidimensional skylines over streaming data. In: International Conference on Database Systems for Advanced Applications, pp. 338–342. Springer (2019)
7. Khames, W., Hadjali, A., Lagha, M.: Skyline computation on multicore architectures: a survey. In: 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), pp. 1–6. IEEE (2020)
8. Tao, Y; Papadias, D. Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng.; 2006; 18,
9. Zhang, J., Gu, J., Cheng, S., Li, B., Wang, W., Meng, D.: Efficient algorithms of parallel skyline join over data streams. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 184–199. Springer (2018)
10. Borzsony, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings 17th International Conference on Data Engineering, pp. 421–430. IEEE (2001)
11. Liu, R., Li, D.: Dynamic dimension indexing for efficient skyline maintenance on data streams. In: International Conference on Database Systems for Advanced Applications, pp. 272–287. Springer (2020)
12. Lu, H., Zhou, Y., Haustad, J.: Continuous skyline monitoring over distributed data streams. In: International Conference on Scientific and Statistical Database Management, pp. 565–583. Springer (2010)
13. Das Sarma, A; Lall, A; Nanongkai, D; Xu, J. Randomized multi-pass streaming skyline algorithms. Proc. VLDB Endow.; 2009; 2,
14. Balke, W.-T., Güntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Advances in Database Technology-EDBT 2004: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, March 14–18, 2004 9, pp. 256–273. Springer (2004)
15. Alami, K; Maabout, S. A framework for multidimensional skyline queries over streaming data. Data & Knowl. Eng.; 2020; 127, 101792. [DOI: https://dx.doi.org/10.1016/j.datak.2020.101792]
16. Papadopoulos, AN; Tiakas, E; Tzouramanis, T; Georgiadis, N; Manolopoulos, Y. Skylines and other dominance-based queries. Synth. Lect. Data Manag.; 2020; 15,
17. Im, H; Park, S. Group skyline computation. Inform. Sci.; 2012; 188, pp. 151-169.2873668 [DOI: https://dx.doi.org/10.1016/j.ins.2011.11.014]
18. Sheng, C., Tao, Y.: On finding skylines in external memory. In: Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 107–116 (2011)
19. Endres, M., Glaser, E.: Indexing for skyline computation: a comparison study. In: Flexible Query Answering Systems: 13th International Conference, FQAS 2019, Amantea, July 2–5, 2019, Proceedings 13, pp. 31–42. Springer (2019)
20. Khalefa, M.E., Mokbel, M.F., Levandoski, J.J.: Skyline query processing for incomplete data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 556–565. IEEE (2008)
21. Gao, Y; Miao, X; Cui, H; Chen, G; Li, Q. Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data. Expert Syst. Appl.; 2014; 41,
22. Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 483–494 (2009)
23. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (2003)
24. Sun, S; Huang, Z; Zhong, H; Dai, D; Liu, H; Li, J. Efficient monitoring of skyline queries over distributed data streams. Knowl. Inform. Syst.; 2010; 25,
25. Balke, W.-T., Güntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Advances in Database Technology-EDBT 2004: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, March 14–18, 2004 9, pp. 256–273. Springer (2004)
26. Lo, E; Yip, KY; Lin, K-I; Cheung, DW. Progressive skylining over web-accessible databases. Data Knowl. Eng.; 2006; 57,
27. Tan, K.-L., Eng, P.-K., Ooi, B.C., et al.: Efficient progressive skyline computation. In: VLDB, vol. 1, pp. 301–310 (2001)
28. Lee, KC; Lee, W-C; Zheng, B; Li, H; Tian, Y. Z-sky: an efficient skyline query processing framework based on z-order. VLDB J.; 2010; 19, pp. 333-362. [DOI: https://dx.doi.org/10.1007/s00778-009-0166-x]
29. Selke, J., Balke, W.-T.: Skymap: a trie-based index structure for high-performance skyline query processing. In: International Conference on Database and Expert Systems Applications, pp. 350–365. Springer (2011)
30. Lee, J., Hwang, S.-w.: Bskytree: scalable skyline computation using a balanced pivot selection. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 195–206 (2010)
31. Koizumi, K., Eades, P., Hiraki, K., Inaba, M.: Bjr-tree: fast skyline computation algorithm for serendipitous searching problems. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 272–282. IEEE (2017)
32. Hose, K; Vlachou, A. A survey of skyline processing in highly distributed environments. VLDB J.; 2012; 21, pp. 359-384. [DOI: https://dx.doi.org/10.1007/s00778-011-0246-6]
33. Li, X; Wang, Y; Li, X; Wang, Y. Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index. Knowl. Inform. Syst.; 2014; 41,
34. Im, H; Park, J; Park, S. Parallel skyline computation on multicore architectures. Inform. Syst.; 2011; 36,
35. Feng, X., Gao, Y., Jiang, T., Chen, L., Miao, X., Liu, Q.: Parallel k-skyband computation on multicore architecture. In: Asia-Pacific Web Conference, pp. 827–837. Springer (2013)
36. Chester, S., Šidlauskas, D., Assent, I., Bøgh, K.S.: Scalable parallelization of skyline computation for multi-core processors. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1083–1094. IEEE (2015)
37. Buanga, JPM; Badibanga, SN; Ilunga, RK. Enhanced parallel skyline on multi-core architecture with low memory space cost. Int. J. Comput. Sci. Issues (IJCSI); 2016; 13,
38. Zeng, Y; Li, K; Yu, S; Zhou, Y; Li, K. Parallel and progressive approaches for skyline query over probabilistic incomplete database. IEEE Access; 2018; 6, pp. 13289-13301. [DOI: https://dx.doi.org/10.1109/ACCESS.2018.2806379]
39. Zhu, H; Zhu, P; Li, X; Liu, Q; Xun, P. Parallelization of skyline probability computation over uncertain preferences. Concurr. Comput.; 2017; 29,
40. Papapetrou, O; Garofalakis, M. Monitoring distributed fragmented skylines. Distrib. Parall. Databases; 2018; 36, pp. 675-715. [DOI: https://dx.doi.org/10.1007/s10619-018-7223-7]
41. Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., El Abbadi, A.: Parallelizing skyline queries for scalable distribution. In: Advances in Database Technology-EDBT 2006: 10th International Conference on Extending Database Technology, Munich, March 26–31, 2006 10, pp. 112–130. Springer (2006)
42. Gao, Y., Chen, G., Chen, L., Chen, C.: Parallelizing progressive computation for skyline queries in multi-disk environment. In: Database and Expert Systems Applications: 17th International Conference, DEXA 2006, Kraków, Poland, September 4-8, 2006. Proceedings 17, pp. 697–706. Springer (2006)
43. Wang, S., Ooi, B.C., Tung, A.K., Xu, L.: Efficient skyline query processing on peer-to-peer networks. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 1126–1135. IEEE (2006)
44. Wang, S; Vu, QH; Ooi, BC; Tung, AK; Xu, L. Skyframe: a framework for skyline query processing in peer-to-peer systems. VLDB J.; 2009; 18, pp. 345-362. [DOI: https://dx.doi.org/10.1007/s00778-008-0104-3]
45. Hose, K.: Processing skyline queries in p2p systems. In: VLDB 2005 PhD Workshop, pp. 36–40. Citeseer (2005)
46. Li, H., Tan, Q., Lee, W.-C.: Efficient progressive processing of skyline queries in peer-to-peer systems. In: Proceedings of the 1st International Conference on Scalable Information Systems, p. 26 (2006)
47. Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 66–66. IEEE (2006)
48. Li, J; Xiong, S et al. Efficient pr-skyline query processing and optimization in wireless sensor networks. Wirel. Sens. Netw.; 2010; 2,
49. Vlachou, A., Doulkeridis, C., Kotidis, Y.: Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 227–238 (2008)
50. Yu, J., Liu, X., Liu, G.-h.: A window-based algorithm for skyline queries. In: Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT’05), pp. 907–909. IEEE (2005)
51. Morse, M; Patel, JM; Grosky, WI. Efficient continuous skyline computation. Inform. Sci.; 2007; 177,
52. Huang, Z; Lu, H; Ooi, BC; Tung, AK. Continuous skyline queries for moving objects. IEEE Trans. Knowl. Data Eng.; 2006; 18,
53. Sacharidis, D., Bouros, P., Sellis, T.: Caching dynamic skyline queries. In: Scientific and Statistical Database Management: 20th International Conference, SSDBM 2008, Hong Kong, July 9-11, 2008 Proceedings 20, pp. 455–472. Springer (2008)
54. Han, X; Wang, B; Lai, G. Dynamic skyline computation on massive data. Knowl. Inform. Syst.; 2019; 59,
55. Zeighami, S., Ghinita, G., Shahabi, C.: Secure dynamic skyline queries using result materialization. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 157–168. IEEE (2021)
56. Wang, W., Li, H., Peng, Y., Bhowmick, S.S., Chen, P., Chen, X., Cui, J.: An efficient secure dynamic skyline query model (2020). arXiv preprint arXiv:2002.07511
57. Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: Knowledge Discovery in Databases: PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, September 20–24, 2004. Proceedings 8, pp. 255–266. Springer (2004)
58. Xin, J., Wang, G., Chen, L., Zhang, X., Wang, Z.: Continuously maintaining sliding window skylines in a sensor network. In: International Conference on Database Systems for Advanced Applications, pp. 509–521. Springer (2007)
59. Sarkas, N., Das, G., Koudas, N., Tung, A.K.: Categorical skylines for streaming data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 239–250 (2008)
60. Vlachou, A., Nørvåg, K.: Bandwidth-constrained distributed skyline computation. In: Proceedings of the Eighth ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 17–24 (2009)
61. Zhu, L; Tao, Y; Zhou, S. Distributed skyline retrieval with low bandwidth consumption. IEEE Trans. Knowl. Data Eng.; 2008; 21,
62. Vlachou, A; Doulkeridis, C; Kotidis, Y; Vazirgiannis, M. Efficient routing of subspace skyline queries over highly distributed data. IEEE Trans. Knowl. Data Eng.; 2009; 22,
63. Rocha-Junior, J.B., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient execution plans for distributed skyline query processing. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 271–282 (2011)
64. Cui, B., Lu, H., Xu, Q., Chen, L., Dai, Y., Zhou, Y.: Parallel distributed processing of constrained skyline queries by filtering. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 546–555. IEEE (2008)
65. Lee, YW; Lee, KY; Kim, MH. Efficient processing of multiple continuous skyline queries over a data stream. Inform. Sci.; 2013; 221, pp. 316-337. [DOI: https://dx.doi.org/10.1016/j.ins.2012.09.040]
66. Lee, J; You, G-W; Hwang, S-W; Selke, J; Balke, W-T. Interactive skyline queries. Inform. Sci.; 2012; 211, pp. 18-35. [DOI: https://dx.doi.org/10.1016/j.ins.2012.04.007]
67. Xia, T; Zhang, D; Fang, Z; Chen, C; Wang, J. Online subspace skyline query processing using the compressed skycube. ACM Trans. Database Syst. (TODS); 2012; 37,
68. Lee, J; Hwang, S-W. Scalable skyline computation using a balanced pivot selection technique. Inform. Syst.; 2014; 39, pp. 1-21. [DOI: https://dx.doi.org/10.1016/j.is.2013.05.005]
69. Lee, J; Hwang, S-W. Toward efficient multidimensional subspace skyline computation. VLDB J.; 2014; 23, pp. 129-145. [DOI: https://dx.doi.org/10.1007/s00778-013-0317-y]
70. Bai, M; Xin, J; Wang, G; Zimmermann, R; Wang, X. Skyline-join query processing in distributed databases. Front. Comput. Sci.; 2016; 10,
71. Wang, Y; Wei, W; Deng, Q; Liu, W; Song, H. An energy-efficient skyline query for massively multidimensional sensing data. Sensors; 2016; 16,
72. Yu, B; Choi, W; Liu, L. Exploring correlation for fast skyline computation. J. Supercomput.; 2017; 73,
73. Huang, Y.-K., Lee, C.-P., Tsai, C.-Y.: Evaluating knn-skyline queries in dynamic road networks. In: 2018 27th Wireless and Optical Communication Conference (WOCC), pp. 1–2. IEEE (2018)
74. Huang, Y-K; Chang, C-H; Lee, C. Continuous distance-based skyline queries in road networks. Inform. Syst.; 2012; 37,
75. Rudenko, L., Endres, M.: Real-time skyline computation on data streams with sls: implementation and experiences (2018)
76. Ren, W; Lian, X; Ghazinour, K. Skyline queries over incomplete data streams. VLDB J.; 2019; 28,
77. Guo, X; Li, H; Wulamu, A; Xie, Y; Fu, Y. Efficient processing of skyline group queries over a data stream. Tsinghua Sci. Technol.; 2016; 21,
78. Yang, Z; Zhou, X; Li, K; Gao, Y; Li, K. Progressive approaches to flexible group skyline queries. Knowl. Inform. Syst.; 2021; 63, pp. 1471-1496. [DOI: https://dx.doi.org/10.1007/s10115-021-01562-8]
79. Sharifzadeh, M., Shahabi, C.: The spatial skyline queries. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 751–762 (2006)
80. Son, W., Lee, M.-W., Ahn, H.-K., Hwang, S.-w.: Spatial skyline queries: An efficient geometric algorithm. In: Advances in Spatial and Temporal Databases: 11th International Symposium, SSTD 2009 Aalborg, Denmark, July 8-+10, 2009 Proceedings 11, pp. 247–264. Springer (2009)
81. Wang, C; Wang, C; Guo, G; Ye, X; Philip, SY. Efficient computation of g-skyline groups. IEEE Trans. Knowl. Data Eng.; 2017; 30,
82. Huang, Y.-K.: Continuous d-skyline queries for objects with time-varying attribute in road networks. In: 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), pp. 439–446. IEEE (2017)
83. Elmi, S; Min, J-K. Spatial skyline queries over incomplete data for smart cities. J. Syst. Architect.; 2018; 90, pp. 1-14. [DOI: https://dx.doi.org/10.1016/j.sysarc.2018.08.005]
84. Siddique, M.A., Zaman, A., Islam, M.M., Morimoto, Y.: Multicore based spatial k dominant skyline computation. In: 2012 Third International Conference on Networking and Computing, pp. 188–194. IEEE (2012)
85. Wang, W; Zhang, J; Sun, M-T; Ku, W-S. A scalable spatial skyline evaluation system utilizing parallel independent region groups. VLDB J.; 2019; 28, pp. 73-98. [DOI: https://dx.doi.org/10.1007/s00778-018-0519-4]
86. Chen, L; Lian, X. Efficient processing of metric skyline queries. IEEE Trans. Knowl. Data Eng.; 2008; 21,
87. Fuhry, D., Jin, R., Zhang, D.: Efficient skyline computation in metric space. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 1042–1051 (2009)
88. Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 892–903. IEEE (2009)
89. Fu, X; Miao, X; Xu, J; Gao, Y. Continuous range-based skyline queries in road networks. World Wide Web; 2017; 20,
90. Hidayat, A; Cheema, MA; Lin, X; Zhang, W; Zhang, Y. Continuous monitoring of moving skyline and top-k queries. VLDB J.; 2022; 31,
91. Lin, X; Xu, J; Hu, H. Range-based skyline queries in mobile environments. IEEE Trans. Knowl. Data Eng.; 2011; 25,
92. Wang, W.-C., Wang, E.T., Chen, A.L.: Dynamic skylines considering range queries. In: Database Systems for Advanced Applications: 16th International Conference, DASFAA 2011, Hong Kong, April 22–25, 2011, Proceedings, Part II 16, pp. 235–250. Springer (2011)
93. Tzouramanis, T., Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y.: The range skyline query. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 47–56 (2018)
94. Dellis, E., Seeger, B.: Efficient computation of reverse skyline queries. In: VLDB, vol. 7, pp. 291–302. Citeseer (2007)
95. Lian, X., Chen, L.: Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 213–226 (2008)
96. Wu, X., Tao, Y., Wong, R.C.-W., Ding, L., Yu, J.X.: Finding the influence set through skylines. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 1030–1041 (2009)
97. Lian, X; Chen, L. Reverse skyline search in uncertain databases. ACM Trans. Database Syst. (TODS); 2008; 35,
98. Zhu, L., Li, C., Chen, H.: Efficient computation of reverse skyline on data stream. In: 2009 International Joint Conference on Computational Sciences and Optimization, vol. 1, pp. 735–739. IEEE (2009)
99. Lim, J; Bok, K; Yoo, J. A continuous reverse skyline query processing scheme for multimedia data sharing in mobile environments. Multimedia Tools Appl.; 2019; 78,
100. Banaei-Kashani, F; Ghaemi, P; Movaqar, B; Kazemitabar, SJ. Efficient maximal reverse skyline query processing. GeoInformatica; 2017; 21, pp. 549-572. [DOI: https://dx.doi.org/10.1007/s10707-017-0302-5]
101. Chen, L., Lian, X.: Dynamic skyline queries in metric spaces. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, pp. 333–343 (2008)
102. Zhang, Z., Cheng, R., Papadias, D., Tung, A.K.: Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 495–508 (2009)
103. Hanning, W., Weixiang, X., Yang, J., Wei, L., Chaolong, J.: Efficient processing of continuous skyline query over smarter traffic data stream for cloud computing. Discrete Dynamics in Nature and Society 2013 (2013)
104. Wulamu, A., Li, H., Guo, X., Xie, Y., Fu, Y.: Processing skyline groups on data streams. In: 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), pp. 935–942. IEEE (2015)
105. Bai, M; Xin, J; Wang, G; Zhang, L; Zimmermann, R; Yuan, Y; Wu, X. Discovering the representative skyline over a sliding window. IEEE Trans. Knowl. Data Eng.; 2016; 28,
106. Tzanakas, A., Tiakas, E., Manolopoulos, Y.: Skyline algorithms on streams of multidimensional data. In: East European Conference on Advances in Databases and Information Systems, pp. 63–71. Springer (2016)
107. Tang, Y; Chen, S. Supporting continuous skyline queries in dynamically weighted road networks. Math. Probl. Eng.; 2020; 127, 101792.
108. Koizumi, K; Eades, P; Hiraki, K; Inaba, M. Bjr-tree: fast skyline computation algorithm using dominance relation-based tree structure. Int. J. Data Sci. Anal.; 2019; 7,
109. Jiang, T; Zhang, B; Lin, D; Gao, Y; Li, Q. Efficient column-oriented processing for mutual subspace skyline queries. Soft Comput.; 2020; 24,
110. Zheng, Z; Ruan, K; Yu, M; Zhang, X; Wang, N; Li, D. k-dominant skyline query algorithm for dynamic datasets. Front. Comput. Sci.; 2021; 15,
111. Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: Proceedings of the Twenty-eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 279–287 (2009)
112. Zhang, W., Lin, X., Zhang, Y., Wang, W., Yu, J.X.: Probabilistic skyline operator over sliding windows. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1060–1071. IEEE (2009)
113. Su, H.Z., Wang, E.T., Chen, A.L.: Continuous probabilistic skyline queries over uncertain data streams. In: International Conference on Database and Expert Systems Applications, pp. 105–121. Springer (2010)
114. Arefin, MS; Morimoto, Y. Skyline sets queries for incomplete data. AIRCC’s Int. J. Comput. Sci. Inform. Technol.; 2016; 4,
115. Bharuka, R., Kumar, P.S.: Finding skylines for incomplete data. In: Proceedings of the Twenty-Fourth Australasian Database Conference-Volume 137, pp. 109–117 (2013)
116. Wang, Y; Shi, Z; Wang, J; Sun, L; Song, B. Skyline preference query based on massive and incomplete dataset. IEEE Access; 2017; 5, pp. 3183-3192. [DOI: https://dx.doi.org/10.1109/ACCESS.2016.2639558]
117. Miao, X; Gao, Y; Zheng, B; Chen, G; Cui, H. Top-k dominating queries on incomplete data. IEEE Trans. Knowl. Data Eng.; 2015; 28,
118. Zhang, W; Li, A; Cheema, MA; Zhang, Y; Chang, L. Probabilistic n-of-n skyline computation over uncertain data streams. World Wide Web; 2015; 18,
119. Miao, X; Gao, Y; Chen, G; Zhang, T. k-dominant skyline queries on incomplete data. Inform. Sci.; 2016; 367, pp. 990-1011. [DOI: https://dx.doi.org/10.1016/j.ins.2016.07.034]
120. Islam, M.S., Rahayu, W., Liu, C., Anwar, T., Stantic, B.: Computing influence of a product through uncertain reverse skyline. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 1–12 (2017)
121. Liu, C.-M., Pak, D., Ortiz Castellanos, A.E.: Priority-based skyline query processing for incomplete data. In: 25th International Database Engineering & Applications Symposium, pp. 204–211 (2021)
122. Ding, L; Zhang, X; Zhang, H; Liu, L; Song, B. Crowdsj: skyline-join query processing of incomplete datasets with crowdsourcing. IEEE Access; 2021; 9, pp. 73216-73229. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3079324]
123. Li, X., Wang, Y., Li, X., Wang, Y., Huang, R.: Parallelizing probabilistic streaming skyline operator in cloud computing environments. In: 2013 IEEE 37th Annual Computer Software and Applications Conference, pp. 84–89. IEEE (2013)
124. Li, X; Wang, Y; Li, X; Wang, Y. Parallel skyline queries over uncertain data streams in cloud computing environments. Int. J. Web Grid Serv.; 2014; 10,
125. Koizumi, K., Inaba, M., Hiraki, K.: Efficient implementation of continuous skyline computation on a multi-core processor. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 52–55. IEEE (2015)
126. De Matteis, T., Di Girolamo, S., Mencagli, G.: A multicore parallelization of continuous skyline queries on data streams. In: European Conference on Parallel Processing, pp. 402–413. Springer (2015)
127. Montahaie, E., Ghafouri, M., Rahmani, S., Ghasemi, H., Bakhtiar, F.S., Zamanshoar, R., Jafari, K., Gavahi, M., Mirzaei, R., Ahmadzadeh, A., et al.: Efficient continuous skyline computation on multi-core processors based on manhattan distance. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 56–59. IEEE (2015)
128. Islam, M.S., Liu, C., Rahayu, W., Anwar, T.: Q+ tree: An efficient quad tree based data indexing for parallelizing dynamic and reverse skylines. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1291–1300 (2016)
129. Liu, J., Li, X., Ren, K., Song, J., Zhang, Z.: Parallel n-of-n skyline queries over uncertain data streams. In: International Conference on Database and Expert Systems Applications, pp. 176–184. Springer (2018)
130. Liu, J; Li, X; Ren, K; Song, J. Parallelizing uncertain skyline computation against n-of-n data streaming model. Concurr. Comput.; 2019; 31,
131. Li, X., Liu, J., Ren, K., Li, X., Ren, X., Deng, K.: Parallel k-dominant skyline queries over uncertain data streams with capability index. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1556–1563. IEEE (2019)
132. Alami, K., Maabout, S.: A partitioning approach for skyline queries in presence of partial and dynamic orders. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems XLIX, pp. 70–96. Springer (2021)
133. Tai, LK; Wang, ET; Chen, AL. Finding the most profitable candidate product by dynamic skyline and parallel processing. Distrib. Parall. Databases; 2021; 39, pp. 1-30. [DOI: https://dx.doi.org/10.1007/s10619-021-07323-4]
134. Bøgh, K.S., Chester, S., Šidlauskas, D., Assent, I.: Template skycube algorithms for heterogeneous parallelism on multicore and gpu architectures. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 447–462 (2017)
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.