Access Control Analysis in Heterogeneous Big Data Management Systems

Abstract

Big data management systems are in demand today in almost all industries, being also a foundation for artificial intelligence training. The use of heterogeneous polystores in big data systems has led to the fact that tools within the same system have different data granularity and access control models. The harmonization of these components by the security administrator and the implementation of a common access policy are now carried out by hand. This leads to an increasing number of vulnerabilities, which in turn become frequent causes of data leaks. The current situation in the field of automation and analysis of access control in big data systems reveals the lack of automation solutions for polystore-based systems. This paper addresses the problem of automated access control analysis in big data management systems. We formulate and discuss the main contradiction between the requirement of scalability and flexibility of access control and the increased workload on the security administrator, aggravated by the use of different data and access control models in system components. To solve this problem, we propose a new automated method for analyzing security policies based on a graph model, which reduces the number of potential vulnerabilities caused by incorrect management of big data systems. The proposed method uses the data lifecycle model of the system, its current settings, and the required security policy. The use of two-pass analysis (from data sources to data receivers and back) allows us to solve two problems: the analysis of the access control system for potential vulnerabilities and the check for business logic vulnerabilities. As an example, we consider the use of a developed prototype tool for security policy analysis in a big data management system.

Full text

Translate

Turn on search term navigation

INTRODUCTION

In recent years, the concept of big data has become an integral part of the modern digital economy. Sets of heterogeneous dynamic data are used not only by search engines and social networks but also by online trading platforms, telecom operators, banking sector, e-government services, and even industrial enterprises.

In the general case, the big data ecosystem includes three architectural levels, each of which has its own interpretation of big data: its own terminology, specialists, methods, and technologies, as well as its own information security threats and information protection methods.

These are the infrastructure, logical (engineering), and conceptual (application) levels. At the infrastructure level, big data are generally associated with data processing centers, including the corresponding hardware, network infrastructure, and virtualization tools [1]. At the logical level, there is a concept of a big data management system, which is similar to the well-known concept of a database management system (DBMS). By analogy with data processing in DBMS, this level is sometimes called data engineering. The logical level includes a new class of data storages—polystores or polydatabases—which are a combination of several heterogeneous DBMSs that form unified data processing architecture [2]. Polystores also complement stream processing tools and related services: queue brokers, workload balancers, etc. At the top (conceptual or application) level, we deal with the value of information for businesses, organizational cases of its use, data outsourcing, collaborative data processing, and other high-level tasks of data and knowledge management [3, 4].

The large number of data leaks in big data ecosystems calls for the accelerated development of information protection methods and tools for systems of this class [5, 6]. This paper focuses primarily on the security of the logical level of data processing, or the security of polystores, and particularly on solving the problem of automated access control analysis.

ACCESS CONTROL ANALYSIS IN BIG DATA POLYSTORES

Current research in the field of improving the efficiency of access control in heterogeneous big data systems is focused on searching for new access control models and the use of distributed ledger technologies to protect against data leaks.

Overview of Access Control Analysis Methods and Tools for Big Data Polystores

Currently, access control in big data systems is automated for tools and ecosystems based on a single data model (e.g., the Hadoop ecosystem and the key-value data model). For big data systems based on polystores, the problem of configuring and analyzing an integral access control system, taking into account all components and data granulation in them, is solved by hand. Over the past few years, researchers have been investigating this problem in several directions.

First, there are access control methods and tools based on one universal model. Despite the use of various approaches (role-based model [7–9], attribute-based model [10] and its modifications [11], knowledge-based approach [12], etc.), researchers have not yet been able to resolve the problem of different data granulation in polystores [13], so the problem of granular access control in this subclass of solutions remains open.

Only in 2023, it was shown that the adaptation of modern methods based on attribute-based access control (ABAC) provides consistent access at the system level, at least for a homogeneous infrastructure [14]. However, the proposed solution was suitable only for the Hadoop infrastructure based on the key-value data model. When porting it to infrastructures of other classes, especially, to polystores, the vulnerabilities of access rule transfer between system components with different data granulation, described, in particular, in [15, 13], remain.

Researchers also proposed some universal access control automation tools based on blockchain technology [16, 17]. The main problem with these tools is also the lack of an analytical mechanism to detect vulnerabilities in access control implementations, because there is still a need to transfer to internal access control systems in structured storages [15, 18].

Thus, existing solutions in the field of automation and analysis of access control in big data systems provide acceptable information security only for homogeneous systems, whereas in polystores, the harmonization of access control settings among heterogeneous components is carried out by hand, which leads to high time costs, high requirements for the qualification of the security analyst, and a significant number of errors.

Problems of Using Access Control Methods in Big Data Management Systems

Access control subsystems in heterogeneous big data management systems currently imply the joint operation of access control modules of individual data processing tools. In this case, traditional access models dominate, while security policies for heterogeneous components are configured entirely by hand. Automation tools are implemented only in homogeneous solutions, i.e., over individual combinations of system components within one family of solutions from one developer. However, specific features of big data, such as volume, diversity, and variability, call for new access control methods, which can be integrated into industrial products in the near future.

Let us consider some modern access control methods used in big data management systems from a perspective of access control automation and analysis in heterogeneous big data systems. Below we discuss methods proposed by different researchers or included in modern tools.

Role-based access control (RBAC) [7], including its modifications [8], currently remains the main access control method in big data systems. It is used in industrial DBMSs integrated into big data systems in manufacturing companies and corporations. The RBAC model is widely used in various frameworks (e.g., Apache Ranger and Apache Sentry [9]). Despite the simplicity and flexibility of this model, its main disadvantages are poor scalability and complex maintenance on a large number of users (roles).

ABAC is the second direction of research in the field of access control in big data management systems. The basic element of this method is an attribute: some characteristic of an object, subject, or environment. The idea of this method is to write access policies as rules for comparing attributes of subjects, objects, environment, or connection [10]. Good scalability, flexibility, and universality of ABAC are faced with the complexity of its implementation and the need for separate support of unstructured data, which is implemented in ABAC extensions considered below.

The problem of unstructured data is potentially solved by content-based access control (CBAC) [11] and knowledge-based access control (KBAC) [12]. These methods are even more flexible than RBAC and ABAC. The former uses semantic analysis to make access decisions, while the latter actually extends CBAC with automated search for keywords (attributes of objects as they are added to the system) and knowledge base maintenance.

Despite the fact that the last two methods are described only in theoretical studies, they clearly illustrate the direction of development of big data access control systems towards scalability and automation (which reduces the workload on the security administrator). It should be noted that none of the considered methods is universal, i.e., each solution is limited to a certain set of tools and certain application domain. In general, there are many works devoted to access control in big data management systems. In addition to those mentioned above, the works on integrating access control with data encryption by using immunization [16] and distributed ledger technology (blockchain) [17] should be especially noted.

This diversity suggests that we should expect the use of various access control models both at the level of individual tools and at the level of the entire big data system. Hence, the problem of harmonizing these models and policies in automated (rather than manual) mode to minimize data leaks and ensure the correctness of business logic becomes increasingly important.

The main problems mentioned in [13, 15, 18] allow us to identify the general features of big data management systems that make it currently difficult to implement comprehensive, consistent access control at the system level. These problems are

• the large number of users with different levels of access over the entire set of system nodes, from service personnel and database administrators to data sources and data consumers;

• the complexity and nonlinearity of the lifecycle of data fragments in big data systems, which makes it difficult to track these fragments and implement consistent access control;

• the different structuring of data during their processing in the system due to the use of different processing tools and multiple operations for extracting parts of data and combining them into new sets.

These problems lead to the main contradiction of access control in big data systems. On the one hand, access control in business applications that work with large volumes of data and implement heterogeneous granulation requires flexibility and scalability, which leads to high complexity of configuration, inaccuracies, and data leaks. On the other hand, with rigid access control at the level of individual system components or data sources, the probability of data leaks decreases; however, problems of business logic implementation and inaccessibility of certain data for customers arise. A solution to this contradiction lies in finding the optimal security policy (or at least a rational one, if we do not formulate the problem as an optimization problem due to the heterogeneity of criteria and metrics used in different organizations), as well as in automating its execution in a polystore access control system.

The practical implementation of this solution poses a certain challenge. As mentioned above, while researchers already propose knowledge-based intelligent systems, the tools themselves still implement RBAC or other types of access control inherent in traditional database management systems (distributed and mandatory access control). This does not allow one to use attribute-based access control systems [14] without additional analytical components.

Thus, access control in big data management systems should be based on some analytical component that enables automated search for rational access parameters in a particular system, taking into account its components, operating principles, and business logic. An important requirement is the possibility of conducting this analysis both at the stage of designing an information system, including big data processing, and during its operation, to assess its level of protection and correctness of operation.

METHOD FOR AUTOMATED ACCESS CONTROL ANALYSIS IN HETEROGENEOUS BIG DATA SYSTEMS

In fact, access control analysis in big data management systems solves two main problems: reducing the number of data leaks due to security misconfiguration [19] and ensuring that data consumers are sufficiently informed to perform their business tasks [20]. To reduce the number of data leaks, it is necessary to consider the entire data lifecycle, because data fragments obtained at later stages can not only be semantically related to input data but also contain them without changes, whereas access to input data can be restricted by certain security settings [21]. Therefore, when analyzing an access control system, it is necessary to both check the access of consumers to input data taking into account the access to output data, and check the access to output data taking into account the access to input data. As a result of this analysis, a general security policy is formulated.

Thus, the automated analysis of access control is based on

• an output data access policy in terms of one of the security models (in the general case, any model); this component is determined by business logic;

• an input data access policy in terms of one of the security models (in the general case, any model); this component is determined by the trust minimization requirement.

• a data processing model, which describes the process of obtaining an output data fragment from an input one.

In the big data management system, the model of data processing from the initial fragments to the resulting ones must take into account the access rights of each processed fragment. For systems in operation, access settings can be extracted directly from data processing tools in an automated manner. At the stage of designing a big data management system, this is done by formalizing and analyzing access right transfer in a heterogeneous polystore [22–24].

To construct a data processing model, we use a graph model of data lifecycle proposed in [25, 26], where the vertices represent structured data fragments and the edges represent operations on them. This graph describes big data processing in a polystore and can be constructed automatically [26], which is a significant advantage.

Based on the analysis results, we decided to use ABAC as a basic access control mechanism, because it has a sufficient degree of granularity and access accuracy; in addition, the other policies can be described in ABAC terms and rules [27, 28]. A superstructure in the form of KBAC can also be reduced to this form [29]. In the analysis system, the original access control rules are represented in terms of the attribute-based security model:

• A_O and A_S are sets of attributes of objects and subjects, respectively;

• O = {o₁, o₂, …, o_k} is a set of nodes of the data processing graph, which represent data fragments, including the set of subjects S = {s₁, s₂, …, s_n} of the big data processing system.

For ∀ o_i ∈ O, i = 1, …, k, and for ∀ s_t ∈ S, t = 1, …, n, a set of attribute-value pairs is defined: {(a₁, ), (a₂, ), …, (a_m, )}, where a_j ∈ A_O ⋃ A_S and is an attribute value.

• D_in ⊆ O is a set of input data fragments;

• D_out ⊆ O is a set of output data fragments;

• E is a set of edges of the data processing graph, which represent operations on data fragments (aggregation and division);

• P is a security policy defined as a set of rules {P₁, P₂, …, P_z} such that P_i = {conditionsubj, conditionobj, action, access}, where conditionsubj is a set of conditions for the subject, conditionobj is a set of conditions for the object (fragment), action is an action of the subject on the object (read, write, read and write), and access is a permission/prohibition for the subject to perform an action.

Figure 1 shows the scheme of access control analysis in the forward direction for analyzing business logic.

Fig. 1. [Images not available. See PDF.]

Access control analysis mechanism (forward direction).

The data lifecycle graph is checked for consistency, errors, and cycles during data processing. This is required primarily for the analysis at the design stage, when the data processing graph is constructed based on the design documentation and is loaded into the system. For analyzing an access control system in operation, this step can be omitted based on security considerations; however, it additionally allows one to identify technical errors in the design of the data processing procedure and can therefore be used as a technological tool by the data engineer.

Then, all input fragments and intermediate fragments generated based on them are sequentially processed. The latter are also added to a list for access right evaluation until the final output data fragments, which do not generate new elements, are obtained. An additional advantage of this scheme is the possibility to detect errors in the design of the data processing system, e.g., the presence of unused (redundant) data in intermediate storages, etc.

The analysis in the reverse direction, from output data fragments to input ones, is carried out similarly. The general order of the reverse analysis is as follows.

• A list of output data fragments is generated and rights of access to them are granted based on system settings or some security policy. List O_in is generated.

• For each resulting fragment from O_in, it is determined which data fragments were used to generate it, and list O_temp is formed. In other words, based on the data lifecycle graph, the inheritance relationships are determined.

• For each data fragment from O_temp, access rights are set based on the reverse analysis of access rules from policy P.

• Some elements of O_temp are transferred to O_in, the other elements are excluded from further consideration (from O_in) and step 2 is repeated.

• Possible access matrices for input data are finally generated and compared with the security policy and access control settings.

Thus, this analysis method allows one to identify possible data leaks by detecting attempts of information consumers and system users (subjects) to access confidential input data.

In addition, the analysis allows one to identify discrepancies in access control settings and, importantly, to offer the security administrator possible alternatives of access control settings in polystore data processing tools, which are in compliance with the current policy of output data usage.

The resulting variants of access matrices can also be used to improve the system.

However, when no satisfactory result is obtained, this means that it is impossible to comply with the required security policy on the current set of tools with the current order of data processing. When modifying these components (data lifecycle graph, granulation, and access settings with right transfer rules), the analysis must be repeated.

APPLICATION OF THE ACCESS CONTROL ANALYSIS METHOD TO HETEROGENEOUS BIG DATA SYSTEMS

The general architecture of the framework for automated access control analysis in big data management systems is shown in Fig. 2.

Fig. 2. [Images not available. See PDF.]

Architecture of the framework for automated access control analysis in big data systems.

The input data of the analysis module can be both the characteristics of an existing system, which are collected automatically, and the parameters of a system at the design stage. The output data are access matrices obtained based on the current and desired security policies.

Let us illustrate the proposed automated analysis by a particular example. Since information in big data systems, including information that concerns access control and data routes, is confidential and can be exploited by intruders [30], we consider an anonymized big data management system based on a polystore with a reduced number of user roles.

As the test system, we used a data processing system of an organization with four distributed branches, each of which had four divisions: laboratory, management, executive office, and administrator. The software components of the test system were Apache Spark and data storages based on MongoDB and PostgreSQL. Thus, a heterogeneous environment was modeled using different data models. Information on the security policies implemented in DBMSs were collected using query interfaces and a module written in Python. As part of the experiment, we used the RBAC model implemented in the data processing tools with their own data granulation: at the cell level in PostgreSQL and at the collection level in MongoDB. It should be noted that, for MongoDB, it is the most detailed level of access granulation. Then, tables of input data fragments and system subjects, as well as sets of their attributes, were generated, and a general security policy was defined in terms of the attribute-based model. The policy consisted of more than ten different access control rules, including rules based on potential end-user access settings and rules built on the data from the data processing tools. Desired access control rules (desired security policies for input and output data) were also formulated in terms of the attribute-based model. By default, if the security policy did not contain a suitable rule, then access was denied. The prohibitive rules were at the beginning of the policy and had priority over the permissive rules. Thus, if a prohibitive rule and a permissive rule were found for a fragment and a subject at different stages of the data lifecycle, then access was denied.

During processing, data are modified: combined, divided, and converted. Therefore, when using the attribute-based model, the problem of inheritance of both the access rights and the attributes obtained as a result of these data operations arises. In the case of division, it is generally sufficient to transfer attributes to all child fragments without changes, whereas in the case of union and conversion, it is not so simple. As a solution for the union operation, it is proposed to add an additional characteristic—inheritance type—to the attribute. The implemented program had the following attribute inheritance types when generating new data fragments:

• max: takes the maximum attribute value among all parent ones;

• min: takes the minimum attribute value among all parent ones;

• concatenate: combines values into a list if they are different.

The inheritance rules were not applied to the attributes of the subjects who had access to data, because the inheritance of user rights was not considered in the security model. Thus, a structure was developed to describe each attribute of a subject or object in the system and include it in decision rules for access control and transfer. The data fragment lifecycle graph is shown in Fig. 3. In the system, this graph is represented as an adjacency matrix with additionally stored vertex weights (types of data fragments) and edge weights (data operations).

Fig. 3. [Images not available. See PDF.]

Data fragment lifecycle graph.

Table 1 shows an example of the access matrix obtained by the forward direction analysis of access control in the test system. The columns of the table are the subjects of the system, while the rows are the permissions for the subjects to access particular data fragments. This table includes output data fragments 14, 16, 20, and 21.

Table 1. . Access matrix for output data (forward direction analysis)

Fragment ID	User 1	User 2	User 3	User 4	User 5
14	Denied	Denied	Denied	Denied	Denied
16	Denied	Read	Read, Write	Read	Denied
20	Denied	Read	Read, Write	Read	Read
21	Denied	Read	Read, Write	Read	Read

The access matrix obtained as a result of the analysis suggests the possible existence of several significant access control problems in the system under consideration. First, in the case of a complete representation of the system and its users (rather than a part of it), the denial of access to fragment 14 for all users indicates its unavailability for further use. This fragment (dataset) should be removed from the system to an archive or be deleted; otherwise, the rights of access to this fragment for some users must be changed. This will obviously expand their access to input data fragments, because they will receive data based on them. In turn, this expansion can violate the security policy and lead to a data leak. It should also be noted that User 1 has no access to output data, which is also not normal for a full-fledged system.

Let us now consider the result of the reverse analysis (see Table 2). The table shows an access matrix with the minimum rights of access to input fragments that the users must have to receive the output fragments allowed to them without violating the security policy.

Table 2. . Access matrix for input data (reverse analysis)

Fragment ID	User 1	User 2	User 3	User 4	User 5
1	Denied	Denied	Denied	Denied	Denied
2	Denied	Read	Denied	Denied	Denied
3	Denied	Denied	Denied	Denied	Denied
4	Denied	Denied	Denied	Read	Denied
5	Denied	Denied	Denied	Denied	Denied
6	Read, Write	Denied	Denied	Denied	Denied
7	Read, Write	Denied	Read, Write	Denied	Denied
8	Denied	Denied	Denied	Denied	Read
9	Denied	Denied	Read, Write	Denied	Denied

The next step in the security analysis is to compare the resulting matrix with the security policy to detect possible data leaks. For instance, should User 5 have read access to document 8?

Thus, the proposed method for automated access control analysis in big data management systems allows one to detect possible data leaks and business logic vulnerabilities. The analysis can be carried out at different stages of the system lifecycle, from design to maintenance/support.

DISCUSSION OF THE RESULTS

As a result of this study, the main problems of using individual access control methods in big data systems were identified. Taking them into account, for the first time, the main contradiction was explicitly formulated between the flexibility of access control settings in big data systems and the increased workload on the security administrator, which is caused, among other things, by the presence of heterogeneous polystore components in the data processing subsystem that do not allow for efficient automation of access control analysis. As mentioned above, this problem is quite new and solved only partially even at the level of research: for homogeneous storages of the Hadoop ecosystem with one access control model. Solutions for polystores have not yet been proposed, except for the one presented in this paper.

To resolve the contradiction formulated above, we propose a new method for automated access control analysis based on the initial and desired security policies and the data processing graph. The method includes a two-pass analysis of access rules in the forward and backward directions over the data processing graph. It should be noted that, as a result of one cycle of the developed method, the security administrator is provided with access matrices for further analysis, which allow him or her to assess several characteristics of the access control system. In this case,

• the forward direction analysis of the access control system (from input to output data) generates an access matrix, which makes it possible to check the system with the desired security policy for business logic vulnerabilities;

• the reverse analysis (from output data to input data) makes it possible to assess the possibility of data leaks as a result of logical inference based on the semantic coherence of data fragments, as well as to analyze the implementation of the principle of least privilege.

The iterative matching of the results of the direct and reverse analysis by adjusting security policies or settings for access right transfer in individual tools allows the security administrator (analyst) to obtain a rational access control policy for the entire system, which can also be automatically converted into the rules of a particular tool. The fact that the iterative analysis, except for the data collection stage, is carried out on a model makes it possible to reduce the workload and avoid unnecessary interference in the operation of the system. However, it should be noted that the data collection for the analysis of an operating big data system cannot be carried out without increasing the workload, because it is required to construct the data processing graph based on the migration of data fragments.

The proposed method and framework for automated access control analysis in big data systems, first of all, allows the security administrator to reduce the cost for collecting access control information, as well as facilitate the manual analysis of access right transfer among data processing tools. Unfortunately, the exact amount of time saved by the automation cannot be estimated, because it depends significantly on the complexity of a particular big data system and its data lifecycle, the number of types of system users, and the number of types of heterogeneous data fragments. Nevertheless, we can speak of a reduction in the amount of time required for data collection and, most importantly, for finding a rational security policy. For the anonymized system considered as an example, this time was several hours.

It should also be noted that the proposed method and framework can be integrated into a big data system together with other solutions that provide access control for homogeneous components and information processing subsystems described above. The possibility of implementing the proposed method in heterogeneous polystores, the components of which use their own access control systems, is its key difference and advantage. In the experiment, we used DBMSs with different data structuring (postrelational and document-oriented) from different manufacturers. The solution can be expanded to other classes of systems or individual systems only by adding the corresponding functional components to the data collection module, because the organization of system catalogs and dialects of query languages differ significantly in different systems. It may also be necessary to adjust the mapping of access control rules from the data processing tool to the attribute-based model. From a scientific perspective, the problem of constructing a highly detailed data processing graph for arbitrary data storages still poses a challenge. Its solution requires integrating the auditing components of data processing nodes (e.g., distributed ledger-based audit [26]) with internal data conversions in DBMS when executing queries. The latter can be obtained based on the analysis of query execution plans in relational systems and MapReduce pipelines in nonrelational solutions. The collection of these data in modern DBMSs is also easy to automate; however, it requires individual modules for integration with each particular type or even version of the system. Seamless combination of these plans with results of data flow audit seems to be an important problem for further research.

In addition to its scientific novelty, the proposed method has the following advantages in terms of practical implementation:

• it is suitable for automated access control analysis in big data management systems at different stages of their lifecycle: both at the design stage and at the operational stage.

• implementations of individual steps of the method can be used to solve the related problems, e.g., analyzing the quality of a big data management system.

Certain disadvantages of the proposed solution are currently the need to develop an automated data collection module for each data processing tool used in the system, relatively high-level skills of the security administrator, and, in part, relatively high resource requirements for collecting information from an operating big data processing system.

Thus, the further research in this direction will consist in increasing the degree of automation by integration with data storages of various types, as well as in improving algorithms and methods for data collection and analysis. It is also reasonable to develop methods for decision making support in the analysis of security policies and automate the process of finding rational decisions based on methods of multi-criteria discrete optimization and logic algebra.

CONCLUSIONS

Ensuring big data security, even at one level of consideration, is a complex technical problem. At the level of data processing, it consists primarily in overcoming the heterogeneity and inconsistency of software tools, which is aggravated by business logic requirements (e.g., flexibility), the complexity of systems and large volumes of data, as well as the lack of standards and ready-made solutions.

The problem of automated access control analysis is an integral part of this general problem. Extraction of access settings from data processing tools and their automatic reconfiguration is a well-automated task, whereas finding rational access settings is an extremely complex and time-consuming task. The key requirement for its implementation is the consistency of access control across all data processing tools, which is achieved in this work through the data fragment lifecycle model.

The proposed two-pass analysis method makes it possible to iteratively achieve rational access control settings and facilitate the check for compliance between the access control settings in a real-world system and the required security policy. The analysis in the reverse direction (from data receivers to data sources) allows one to identify inconsistencies and possible data leaks due to the violation of the principle of least privilege. In turn, the forward direction analysis (from data sources to data receivers) makes it possible to speed up the check for business logic vulnerabilities.

The proposed method and prototype tool for automated access control analysis in big data management systems can be used as a basis for constructing various scientific and technical solutions, including auditing tools for systems of this class, tools for their security assessment, intrusion detection tools, etc.

FUNDING

The study was supported by the grant of Russian Science Foundation no. 23-11-20003, https://rscf.ru/project/23-11-20003/; grant of St.Petersburg Science Foundation (Agreement no. 23-11-20003 on the regional grant)

CONFLICT OF INTEREST

The authors of this work declare that they have no conflicts of interest.

Translated by Yu. Kornienko

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

AI tools may have been used in the translation or editing of this article.

REFERENCES

1 Mushtaq, M.S. Security, integrity, and privacy of cloud computing and big data; 2022; [DOI: https://dx.doi.org/10.1201/9781003107286-2]

2 Yung, L.R.B., Ströele, V., and Dantas, M.A.R., A polystore proposed environment supported by an edge-fog infrastructure, Proc. Int. Conf. Advanced Information Networking and Applications, 2023, pp. 292–302. https://doi.org/10.1007/978-3-031-28451-9_26

3 Gao, J. Analysis of enterprise financial accounting information management from the perspective of big data. Int. J. Sci. Res.; 2022; 11, pp. 1272-1276.

4 Vasa, J.; Thakkar, A. Deep learning: Differential privacy preservation in the era of big data. J. Comput. Inf. Syst.; 2023; 63, pp. 608-631. [DOI: https://dx.doi.org/10.1080/08874417.2022.2089775]

5 Dhiman, G. Federated learning approach to protect healthcare data over big data scenario. Sustainability; 2022; 14, pp. 1-14. [DOI: https://dx.doi.org/10.3390/su14052500]

6 Strzelecki, A.; Rizun, M. Consumers’ change in trust and security after a personal data breach in online shopping. Sustainability; 2022; 14, pp. 1-17. [DOI: https://dx.doi.org/10.3390/su14105866]

7 Zhuang, Y. Research on big data access control mechanism. Int. J. Comput. Sci. Eng.; 2023; 26, pp. 192-198. [DOI: https://dx.doi.org/10.1504/IJCSE.2023.129738]

8 Jiang, R. T-RBAC model based on two-dimensional dynamic trust evaluation under medical big data. Wireless Commun. Mobile Comput.; 2021; 2021, pp. 1-17. [DOI: https://dx.doi.org/10.1155/2021/9957214]

9 Gupta, M., Patwa, F., and Sandhu, R., Object-tagged RBAC model for the Hadoop ecosystem, Proc. IFIP Annu. Conf. Data and Applications Security and Privacy, 2017, pp. 63–81. https://doi.org/10.1007/978-3-319-61176-1_4

10 Servos, D.; Osborn, S.L. Current research and open problems in attribute-based access control. ACM Comput. Surv.; 2017; 49, pp. 1-45. [DOI: https://dx.doi.org/10.1145/3007204]

11 Zeng, W., Yang, Y., and Luo, B., Content-based access control: Use data content to assist access control for large-scale content-centric databases, Proc. IEEE Int. Conf. Big Data, 2014, pp. 701–710. https://doi.org/10.1109/BigData.2014.7004294

12 El Haourani, L., Elkalam, A.A., and Ouahman, A.A., Knowledge based access control a model for security and privacy in the big data, Proc. 3rd Int. Conf. Smart City Applications, 2018, pp. 1–8. https://doi.org/10.1145/3286606.3286793

13 Anisetti, M., et al., Dynamic and scalable enforcement of access control policies for big data, Proc. 13th Int. Conf. Management of Digital EcoSystems, 2021, pp. 71–78. https://doi.org/10.1145/3444757.3485107

14 Tall, A.M.; Zou, C.C. A framework for attribute-based access control in processing big data with multiple sensitivities. Appl. Sci.; 2023; 13, pp. 1-28. [DOI: https://dx.doi.org/10.3390/app13021183]

15 Colombo, P.; Ferrari, E. Access control technologies for big data management systems: Literature review and future trends. Cybersecurity; 2019; 2, pp. 1-13. [DOI: https://dx.doi.org/10.1186/s42400-018-0020-9]

16 Muneeshwari, P.; Athisha, G. Extended artificial immune system-based optimized access control for big data on a cloud environment. Int. J. Commun. Syst.; 2020; 33, pp. 1-15. [DOI: https://dx.doi.org/10.1002/dac.3947]

17 Mounnan, O., Abou, El., Kalam, A., and El Haourani, L., Decentralized access control infrastructure using blockchain for big data, Proc. IEEE/ACS 16th Int. Conf. Computer Systems and Applications (AICCSA), 2019, pp. 1–8. https://doi.org/10.1109/AICCSA47632.2019.9035221

18 Vijayalakshmi, K. and Jayalakshmi, V., Shared access control models for big data: A perspective study and analysis, Proc. Int. Conf. Intelligent Computing, Information and Control Systems (ICICCS), 2020, pp. 397–410. https://doi.org/10.1007/978-981-15-8443-5_33

19 Hu, V.C., et al., An access control scheme for big data processing, Proc.10th IEEE Int. Conf. Collaborative Computing: Networking, Applications and Worksharing, 2014, pp. 1–7. https://doi.org/10.4108/icst.collaboratecom.2014.257649

20 Oussous, A. Big data technologies: A survey. J. King Saud Univ. Comput. Inf. Sci.; 2018; 30, pp. 431-448. [DOI: https://dx.doi.org/10.1016/j.jksuci.2017.06.001]

21 Centonze, P. Security and privacy frameworks for access control big data systems. Comput., Mater. Continua; 2019; 59, pp. 361-374. [DOI: https://dx.doi.org/10.32604/cmc.2019.06223]

22 Dziedzic, A., Elmore, A.J., and Stonebraker, M., Data transformation and migration in polystores, Proc. IEEE High Performance Extreme Computing Conf. (HPEC), 2016, pp. 1–6. https://doi.org/10.1109/HPEC.2016.7761594

23 Kroll, J.A.; Kohli, N.; Laskowski, P. Privacy and policy in polystores: A data management research agenda; 2019; Los Angeles, Heterogeneous Data Management, Polystores, and Analytics for Healthcare: [DOI: https://dx.doi.org/10.1007/978-3-030-33752-0_5]

24 Poudel, M. Processing analytical queries over polystore system for a large astronomy data repository. Appl. Sci.; 2022; 12, pp. 1-23. [DOI: https://dx.doi.org/10.3390/app12052663]

25 Poltavtseva, M.A.; Kalinin, M.O. Modeling big data management systems in information security. Autom. Control Comput. Sci.; 2019; 53, pp. 895-902. [DOI: https://dx.doi.org/10.3103/S014641161908025X]

26 Poltavtseva, M.A., et al., Data protection in heterogeneous big data systems, J. Comput. Virol. Hacking Tech., 2023, pp. 1–8. https://doi.org/10.1007/s11416-023-00472-3

27 Sahani, G.; Thaker, C.; Shah, S. Supervised learning-based approach mining ABAC rules from existing RBAC enabled systems. EAI Endorsed Trans. Scalable Inf. Syst.; 2022; 10, pp. 1-8. [DOI: https://dx.doi.org/10.4108/eetsis.v5i16.1560]

28 Talegaon, S., et al., Contemporaneous update and enforcement of ABAC policies, Proc. 27th ACM Symp. Access Control Models and Technologies, 2022, pp. 31–42. https://doi.org/10.1145/3532105.3535021

29 Gupta, T. and Sural, S., Ontology-based evaluation of ABAC policies for inter-organizational resource sharing, Proc. 9th ACM Int. Workshop Security and Privacy Analytics, 2023, pp. 85–94. https://doi.org/10.1145/3579987.3586572

30 Yang, K., et al., An efficient and fine-grained big data access control scheme with privacy-preserving policy, IEEE Internet of Things J., vol. 4, no. 2, pp. 563–571. https://doi.org/10.1109/JIOT.2016.2571718

Word count: 5846

Show less

Access Control Analysis in Heterogeneous Big Data Management Systems

Content area

Abstract

Full text