Content area
The development and use of scientific applications have become an integral part of conducting large-scale experiments in various fields of research that require high-performance computing and big data processing. In the context of developing such applications, non-trivial problems arise in the concerted description and further use of schemes, software, and computational resources to solve subject domain problems of a specific application. Research productivity has become highly dependent on the degree of automation in the preparation and execution of experiments in a computing environment whose resources may be distributed and heterogeneous. Many approaches to the experiment automation are based on workflows as a structure for formalizing and specifying data processing and high-performance computing using distributed applications. Within such approaches, developers and end-users work with workflow management systems for the collaborative development and use of distributed scientific applications. Nowadays, service-oriented applications are coming to the fore. However, there is a wide range spectrum of problems related to the support of modular scientific applications, the standardization of their components and interfaces, the use of heterogeneous information and computing resources, and organization of interdisciplinary research within service-oriented architecture. Known workflow management systems do not fully address the above problems. In this regards, we consider relevant aspects of organizing service-oriented computing in a heterogeneous distributed computing environment. We propose a new framework for creating service-oriented and workflow-based scientific applications. The paper shows that the proposed framework significantly extends and complements the capabilities of systems for such purposes. We also demonstrate the reduction in labour costs associated with the preparation and execution of experiments.
INTRODUCTION
The automation of computation based on workflows (WFs) to solve large resource-intensive problems has undoubtedly had an impact on increasing the productivity of scientific research. In recent years, WFs have become the basis for abstractions covering data processing and high-performance computing using distributed applications. At the same time, the use of a specialized workflow management systems (WMSs) often frees application developers and end-users from the need to delve into the details of the WF execution and their management in a heterogeneous distributed computing environment (HDCE).
WMSs, such as UNICORE [1], HTCondor [2], Pegasus [3], and other systems [4, 5], are powerful tools for the collaborative development and use of distributed scientific applications. They are designed to integrate software, plan schemes for solving scientific and applied problems, allocate resources for computing, launch and manage computational processes, process data, and implement other system operations in a distributed software and hardware environment.
In the context of the WMS development, particular attention is currently being paid to supporting service-based scientific applications (SBSA) [5]. Service-oriented programming aims to develop software systems that support the interaction of applications and services of different types based on the exchange of messages using published and discoverable interfaces [6]. Services often provide a good way to implement application operations for computing and data processing within business processes in various research domains. Thus, the development of service-oriented computing is largely due to effective solutions to a number of problems, including support for standards for modular scientific applications, their components and interfaces, the use of heterogeneous information and computing resources, and the organization of interdisciplinary research.
Unfortunately, the solution to the above problems has not been fully implemented in the known WMSs supporting SBSAs (see e.g. [7–9]). In this context, the paper discusses important aspects of the development and application of SBSAs. We represent a new framework whose components extend and complement the functionality of known WMSs in this direction.
The rest of the paper is organized as follows. Section 2 provides a brief overview of relevant aspects in the organization of service-oriented computing. In Section 3, we consider known WF standards. The Section 4 addresses the proposed Framework for Development and Execution of Scientific WorkFlows (FDE-SWFs). The WF scheduler is described in Section 5. Section 6 provides comparative analysis of the FDE-SWFs functionality with respect to the capabilities of WMSs supporting SBSAs. The FDE-SWFs practical use is shown in Section 7. In Section 8, we discuss the proposed framework limitations. Finally, Section 9 concludes the paper.
RELATED WORK
The paradigm for the development and use of SBSAs represents a logical evolution from object-oriented systems to service systems. As in object-oriented systems, some fundamental concepts of Web services are encapsulation, message passing, and dynamic binding. However, the service-based paradigm goes beyond method signatures. Information about the functions of the service, its location, access methods, etc. can also be presented in the service interface. The development of SBSAs can also be seen as an evolution of the modular approach to programming, since Web services are lightweight, loosely coupled, platform and language independent components.
The current paradigm under consideration is dominated by service-oriented architecture (SOA). SOA is based on the use of many independent Web services that perform predefined operations related to the execution of system or applied applications. A Web service is a software system with standardized interfaces, identified by a unique Web address (URL) [10]. However, Web services have no knowledge about the applications they are running. At the same time, applications do not need information about how Web services execute them. SOA-based Web technologies are actively supported by large development companies, ensuring their widespread distribution and use.
Supporting SOA involves additional costs compared to other software systems integration techniques. However, with respect to the computing environment, SOA offers a number of the following important advantages in the processes of its organization and use [11]:
reuse of environment components to create complex distributed software systems;
a modular approach to software development;
supporting network access to environment components for their developers and users, as well as their interaction with each other;
ensuring the openness of the environment through the use of data transfer protocol standards and the representation of service operations on this data;
cross-platform, which reduces the dependency of computational processes on the software and hardware platforms and programming languages used;
the ability to easily integrate software from different developers.
SOA-based software is typically implemented as a set of Web services that communicate using the Simple Object Access Protocol (SOAP). A Web service is a unit of modularity within SOA software. At the same time, SOA can be implemented using a wide range of additional technologies, such as REpresentational State Transfer (REST), Remote Procedure Call (RPC), Distributed Component Object Model (DCOM), and Common Object Request Broker Architecture (CORBA).
The main formats for representing structured data are eXtensible Markup Language (XML) and JavaScript Object Notation (JSON), which support data validation using XML Schema and JSON Schema respectively. Unstructured data is typically represented as text files or other file formats. Data is transferred between Web services by including it in the body of the message (when the amount of information transferred is small) or by providing the Uniform Resource Locator (URL) address from which it can be downloaded (when the amount of data transferred is large).
There are different ways to describe Web services [12]. These include Web Service Description Language (WSDL) for describing Web services based on SOAP and Web Application Description Language (WADL) for describing Web applications based on HyperText Transfer Protocol (HTTP), including Web services in the REST style. In both cases, XML is used as the basic description language.
WSDL is designed to describe Web services, how to access them, and how to send messages between them. The description of a Web service in WSDL includes the following main sections:
definition of data types, which specify the type of XML messages sent and received by the service, and which are validated using XML Schema;
description of data elements as a list of messages used by the service;
defining abstract operations (ports) – a list of methods that can be performed on messages;
linking services to define methods for message delivery;
address of the service call.
The latest official specification of the WSDL language, version 2.0, allows us to describe both calls to various specialized Web services based on SOAP, such as WPS services, and services based on other protocols, such as REST services.
Developing and using SBSAs has a number of advantages over other types of applications. Having a set of services in the SBSA allows application developers to create, debug, test, deploy and modify their services independently of other developers. This greatly simplifies the development of distributed applications.
Each service can be developed and deployed on different resources with different performance characteristics, amounts of RAM and disk memory, interconnect bandwidth, etc. in Grid systems, on supercomputer resources, or on cloud platforms. With containerization, services can be run on multiple parallel nodes without the need to deploy the entire application to a new node.
An important advantage of SBSAs is their fault tolerance. The failure of a service does not usually lead to the failure of the entire application. In this case, a failed service can easily be restarted or its operations can be taken over by other services if there is computational redundancy in the SBSA.
Table 1 provides an overview of the developments in the field of service-oriented computing. The following important aspects of supporting service-oriented computing at the levels of the application and/or the computing environment as a whole are considered: different types of services, methods for their specification, runtime architecture, service-oriented models for representing various entities of the computing environment, development tools, control systems, SBSAs for different subject domains, etc.
Table 1. . Developments in the field of service-oriented computing
Source | Support aspects | Support level |
|---|---|---|
[13–17] | Grid and Cloud computing, SOAP services, Grid services, SaaS, Globus Toolkit. | Environment |
[9] | Cloud Computing, SOAP services, WMS, WaaS Cloud Platform | Application |
[8] | Cloud Computing, SOAP services, WMS, HyperFlow | Application |
[18, 19] | SOAP services specification methodology, WF specification methodology, WSDL, BPEL | Application |
[20, 21] | Microservices, service compositions, program synthesis | Application |
[22] | REST | Environment |
[1] | Grid and Cloud computing, Grid services, SOAP, REST, WMS, UNICORE | Environment |
[7] | Cloud computing, SOAP, REST, WMS, Galaxy | Environment |
[23] | Cooperative computing, data management, microservices, iRODS | Environment, application |
[24] | Cloud computing, SOAP, REST, MathCloud, Everest | Environment, application |
[25] | Grid and Cloud computing, SOAP services, Grid services, CAEBeans, testbeds | Environment |
[26, 27] | Cloud computing, WF, Web services, intelligent computing management, IaaS, SaaS, PaaS, iPSE | Environment |
[28] | SBSA | Application |
[29] | HPC, Amazon Web Services, Google Compute Engine, OpenStack, Cloud Stack, IaaS, PaaS, SaaS | Environment |
[30, 31] | SBSA, data processing | Environment, application |
[32] | SBSA, data processing | Environment, application |
[33] | Energy research, SBSA, data processing | Environment |
[34, 35] | Geoinformatics, REST services, SOAP services, WPS services, service composition | Environment |
[36] | REST services, SOAP services, microservices, service composition | Application |
[37] | WPS services, service composition, HDCE | Environment, application |
[38] | Multiagent resource management, microservices, service templates | Environment |
WORKFLOW STANDARDS
WFs enable large-scale scientific experiments to be carried out using large data sets. In this case, data processing operations are distributed across different computing resources. WFs may include operations for discovering and linking resources, and for collecting, processing, analyzing, and visualizing data. WFs must be logical, structured, and reliable.
WF operations are performed according to the problem-solving scheme in a logical sequence determined by the WF structure. The use of standards for the description and execution of WFs allows their dissemination in the scientific community and facilitates their reuse. WFs can be deposited in public repositories.
WF representation standards originate in the field of business process modeling. Corresponding solutions have been developed by a number of commercial organizations such as IBM and Microsoft. Open standards are developed by independent consortia, including the World Wide Web Consortium (W3C), the Organization for the Advancement of Structured Information Standards (OASIS), the Workflow Management Coalition (WFMC), the Business Process Management Initiative (BPMI), the United Nations Center for Trade Facilitation, the Electronic Business (UN/CEFACT), and the Object Management Group (OMG) [39]. Some consortia focus their efforts on developing sets of complementary standards, while others develop individual multi-purpose standards. There is as yet no consensus on which standards are most appropriate for SBSAs. Furthermore, there is no established structure of standards for SOA.
WMSs are often characterized by describing processes in terms of data flow rather than business process control flow orientation. A number of research projects are comparing the applicability of different WF description standards. An important direction is the development of standards that take into account the requirements for computation and data transfer for large data sets, as well as ensuring the separation of the abstract representation level of the WF from the level of its execution on specific software and hardware resources. In general, the successful implementation of WF depends on the use of a system of standards, each of which ensures the efficient planning and execution of computing and data processing operations.
A number of standards for describing WF have been proposed by various commercial organizations and consortia [40]. For example, the following languages have been developed: XML Process Definition Language (XPDL), XLANG, Web Services Flow Language (WSFL), Business Process Modeling Language (BPML), Business Process Model and Notation (BPMN), Business Process Specification Schema (BPSS), Web Services Conversation Language (WSCL), Web Services Choreography Interface (WSCI), Yet Another Workflow Language (YAWL), Business Process Execution Language for Web Services (BPEL4WS or BPEL) 1.0, BPEL4WS 1.1, Web Services Choreography Description Language (WS-CDL), and Web Services Business Process Execution Language (WS- BPEL or BPEL) 2.0.
XPDL, developed by the Workflow Management Coalition (WFMC), is designed to exchange process definitions between different information systems, both graphically and semantically. XPDL has been revised several times. The last revision took place in 2012.
Microsoft’s XLANG is an extension of WSDL. Its main purpose is to define business processes and organize the exchange of messages between Web services.
WSFL, developed by IBM, is an XML language that describes a business process as a composition of Web services that describe the sequence of calls to service operations. The order of operations is determined based on the flow of control and data between services. A business process defines operations for receiving, processing, and exchanging data in a particular order.
BPML is an XML-based business process description language introduced by BPMI. It provides tools for performing sequential and parallel operations, supports branches and loops, provides standard functions for calling services, sending and receiving messages, and allows the WF developer to plan the execution of tasks according to a given schedule. BPML provides for the management of long-running WFs. The further development of BPML is BPMN.
BPSS is a standard framework that describes the process of information exchange. BPSS is based on the UN/CEFACT meta-model. It enables organisations to define and collaborate on business transactions between partners to exchange documents and signals electronically for business purposes. BPSS is part of the Electronic Business using XML (ebXML) toolkit from OASIS and UN/CEFACT.
Hewlett-Packard’s WSCL is designed to define business-level conversations as public processes supported by Web services. WSCL defines the order in which XML documents are exchanged. WSCL conversation definitions are also XML documents and can therefore be interpreted by Web services.
YAWL is an extension of XML. It is intended for the formalized description of business processes. YAWL has been developed at the Technical University of Eindhoven. For YAWL, a specialized software platform has been developed. It supports text and graphic modes for constructing business processes and tools for their implementation. The software source code is licensed under the GNU Lesser General Public License (LGPL).
WS-CDL was created by the W3C based on XML to specify peer-to-peer interaction of Web services based on choreography–the ordered exchange of messages between external entities. Service specifications define the connections between heterogeneous computing environments used to develop and host Web applications. In general, the provision of Web service choreography enables interoperable peer-to-peer interaction between any services, regardless of the supporting platform or programming model used to implement them.
By 2003, there was a need to move from the heterogeneous standards of individual consortia to some single standard. The efforts of IBM, Microsoft, BEA, OASIS, and other consortia began the development and evolution of BPEL. As Web services progressed, WSFL and XLANG merged, resulting in a new generation of specification language BPEL4WS 1.1. The BPEL4WS 1.1 language made it possible to extend the Web services interaction model and make it applicable to the representation of business transactions.
With the advent of BPEL, the list of standards that are widely used in practice is shrinking. Formally, BPEL and XPDL are considered to provide the orchestration of interactions between internal and external process entities, while WS-CDL and ebXML provide the choreography. However, the functionality of BPEL and XPDL allows us to describe choreography as well.
In general, BPEL 2.0 [41] defines a model for describing the behaviour of a process in terms of interactions (collections of messages) between processes and their partners (external services). Important additional advantages of BPEL are the following:
• WFs can not only call Web services, but also be represented as services themselves.
A wide range of control elements and work with data, including elements for defining complex data structures and parallel processes for processing them, cycles, branches, subprocesses, elements for implementing asynchronous interaction of Web services, etc..
The use of WSDL to describe Web service interfaces ensures flexible integration with other software and Web applications.
The detailed description of the WF implements the orchestration of internal and external process entities, and the specification of the messaging process reflects the choreography of external entities (called Web services).
In addition to the standards listed above, an informal working group of various organizations and individuals interested in WF portability is developing the Common Workflow Language (CWL) [42]. The team’s goal is to produce specifications that will enable the scientific community to describe WFs that are powerful, easy to use, portable, and reproducible. CWL uses the capabilities of YAML Ain’t Markup Language (YAML) and JSON to represent a number of WF constructs (such as pipelines). The Docker system is used to containerize application software into portable execution environments. It is intended to describe WF with intensive use of data from research fields such as bioinformatics, medicine, chemistry, physics, and astronomy. Version 1.0 of the CWL language was released on July 8, 2016.
The development and evolution of standards for describing Web services is represented in retrospect in Figure 1. Information about Web service specification languages in this figure generalizes, clarifies, and complements the diagrams from [18, 19].
Fig. 1. [Images not available. See PDF.]
Evolution of WF specification languages.
FRAMEWORK
Existing WMSs [43]), including systems that provide support for the service-oriented paradigm of applied software development [7–9], are widely used to create and apply scientific applications and manage them in a computing environment. Unfortunately, specialized tools for continuous integration of developed software, organizing distributed databases and working with them in the memory of computing nodes, and for testing and testing SBSA components in these systems are poorly developed.
In this context, we present a new toolkit FDE-SWFs. FDE-SWFs belong to the class of WMSs. It is based on the approach to developing distributed applied software packages in HDCE, supported by the Orlando Tools (OT) framework [44], and at the same time significantly extends and complements the OT functionality.
FDE-SWFs consist of the following main subsystems: user interface, system components for designing and specifying computational models and managing computational processes and their execution environment. It provides a set of specialized APIs to allow users to access external information and computational resources and systems with which they need to interact during the preparation and execution of computational experiments. The knowledge base of the framework contains the specification of computational models of the applications, WFs, and information about computational resources. The initial data and the results of WF execution are stored in computation databases.
The computational model of SBSA is described by a structureconsisting of the following main elements:
is a set of significant parameters of the subject domain.
is a set of their admissible types.
is a set of software modules representing algorithmic knowledge of the subject domain.
is a set of abstract computational operations and data processing operations that reflect the semantics of the algorithmic knowledge in the model.
is a set of services that implement abstract operations.
is a set of problem statements formulated in procedural or nonprocedural form based on a computational model.
is a set of WFs created on the basis of procedural or nonprocedural problem statements.
is a set of HDCE resources on which application modules and services are hosted and executed.
is a set of relations between the above specified sets.
In general, the , , , , and sets contain subsets of applied and system objects. Applied objects are created by the application developer and supplemented by predefined system objects designed to support the interaction of modules, services and WFs with FDE-SWFs components and external information and computational resources and systems during the preparation and execution of experiments. The computational model designer supports the creation of new operations in based on the WFs, as well as the generation of programs in Python for WFs, with the subsequent inclusion of the generated programs in the set .
The computational model represents the conceptual level of the SBSA computing environment, where the concepts and relations between objects of the software and hardware, software and algorithmic, and service-oriented levels are defined (Fig. 2). It allows application developers and end-users to interact with the HDCE and manage its components at an abstract conceptual level, hiding the details of the organization of the computing processes at other levels. A screenshot of the computational model designer is shown in Fig. 3.
Fig. 2. [Images not available. See PDF.]
Architecture layers of computing environment for service-oriented applications.
Fig. 3. [Images not available. See PDF.]
Screenshot of the computational model designer.
The WF designer ensures their construction on the basis of procedural or nonprocedural problem statements. In the first case, the application developer independently creates WF in an interactive mode using a predefined set of operators for performing an operation, branching, various types of loops, processing parallel data lists, and other constructs. In the case of a nonprocedural problem statement, the developer defines sets of initial and target parameters. The built-in scheduler then automatically creates a sequence of WF operations to calculate the values of the target parameters. Screenshots of the construction of the WFs based on procedural or non-procedural problem statements are shown in Fig. 4.
Fig. 4. [Images not available. See PDF.]
Screenshots of the scientific WF construction based on the procedural and non-procedural problem formulations.
Once the WF construction is complete, this constructor provides the ability to automatically generate its specifications in BPEL and the automatic generation of program code to execute this WF in Python, ensuring the ability to launch and interpret it independently of the FDE-SWFs environment. The WF constructor supports its visualization as a bipartite directed graph. The graph contains only two types of vertices (parameters and operations) and two types of arcs between these vertices: the input arc connects the vertex-input parameter with the vertex-operation, the output arc connects the vertex-operation with the vertex-output parameter. Examples of visualization of WFs created on the basis of procedural and non-procedural problem statements are shown in Fig. 5.
Fig. 5. [Images not available. See PDF.]
Visualization of the scientific WF based on procedural (a) and non-procedural (b) problem formulations.
WF SCHEDULER
Figure 6 shows the HDCE operating scheme. FDE-SWFs is implemented on the Node.js platform. Application developers use the Web interface of the framework to describe the computational model of a SBSA. Using a computational model, the developer creates a WF based on procedural or non-procedural problem formulations. WFs can be implemented in the following ways:
Fig. 6. [Images not available. See PDF.]
Scheme of application development and use.
Based on the composition of WSDL services represented in the Python programming language,
As a WPS service, which is implemented by a traditional WF based on executable modules and jobs for external metaschedulers and Local Resource managers (LRMs), such as Condor DAGMan and HTCondor,
The use the standardized declarative language BPEL.
In the latter case, a WF can be used by any external WMS that supports BPEL. Figure 7 shows schemes for WF execution under the control of FDE-SWFs. Within Scheme 1, the end-user formulates the problem on the computational model of the application subject domain using the FDE-SWFs interface. According to the formulated problem, the computation planning module constructs a problem-solving plan in the form of a WF in the internal representation. The conversion module converts the WF into BPEL for use in external WMSs. Next, the interpretation module performs asynchronous parallel execution of a sequence of WF operations represented as a composition of WSDL services. The calculated data obtained as a result of executing the composition of WSDL services is stored in the database. Upon completion of the WF, the end-user can visualize the calculated data for the purpose of their further analysis.
Fig. 7. [Images not available. See PDF.]
Schemes of the scientific WF execution: scheme (a) 1, (b) 2, and (c) 3.
Unlike Scheme 1, Scheme 2 assumes that WF operations are represented by software modules instead. The process of executing the WF is as follows. The subject domain expert (end-user) formulates the problem, prepares initial information and executable modules using the WF scheduler. Then it selects the necessary resources (resources of public access supercomputer centers, cloud platforms, its own high-performance servers, etc.) taking into account quotas for their use and methods for accessing them. The scheduler automatically creates the WF and generates a job specification. The containerization subsystem allocates the required resources, prepares images according to resource classes and WF modules, and then launches containers on these resources (Fig. 8). When the resources are ready to launch the WF, a message is sent to the LRM (e.g. HTCondor or PBS Torque), which is also embedded into the images. Upon completion of the WF execution, the metascheduler sends the obtained results to the computation database. The end-user can also visualize the calculated data.
Fig. 8. [Images not available. See PDF.]
Scientific WF execution in a containerized computing environment.
The ability to execute WF using WPS services is implemented by Scheme 3. In this case, WFs are registered as WPS services. FDE-SWFs automates the creation, registration, and use of WPS services. In particular, FDE-SWFs automatically registers software modules and WFs as asynchronous WPS services in the appropriate directories on the geoportal. WFs can contain calls to other WPS services, allowing us to work with sets of services. FDE-SWFs supports the ability to exchange files between WPS services as their parameters, including data exchange with the geoportal data storage system.
Figure 9 shows a graph of the development and execution time of a test application with two WFs in OT and FDE-SWFs for all three schemes represented above, with the accumulation of the summary result. The following stages of development and use of the application were taken into account for one of the tasks of studying the properties and functioning processes of the energy infrastructure model:
Fig. 9. [Images not available. See PDF.]
Test application development and execution makespan.
description of the computational model (stage 1);
design of the WF based on the procedural formulation of the problem (stage 2);
design of the WF based on a nonprocedural formulation of the problem (stage 3);
configuration of the HDCE resources (stage 4);
input of initial data (stage 5);
launch and execution of the WFs (stage 6);
obtaining the computation results (stage 7);
visualization of the results (stage 8).
When working with services in OT, the time required to create them manually was also taken into account. In FDE-SWFs, services are created automatically to execute application software modules. The results of the execution time benchmarking show that the advancement of OT functionality in FDE-SWFs has significantly reduced the execution time for all three schemes.
COMPARATIVE ANALYSIS
Within the comparative analysis, the following FDE-SWFs capabilities are considered:
WF representation in the form of a directed graph with cycles and branches ();
support for service-oriented WF ();
use the WF specification standard ();
interaction with WPS services ();
support for software containerization ();
generation of autonomous programs in a basic programming language to execute WF regardless of the WMS environment ().
The level of the capability implementation in a WMS is determined by the following values: 1 (implemented), 0.5 (partially implemented), and 0 (not implemented). The weights – of the capability demands are set based on the aggregated subjective ratings of the end-users. The results of the comparative analysis are represented in Table 2. The parameter is the total evaluation of the considered set of WMSs capabilities, determined by the formulawhere is a number of capabilities.
Table 2. . WMSs capabilities
WMS | c1 | c2 | c3 | c4 | c5 | c6 | w1 | w2 | w3 | w4 | w5 | w6 | E |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
OT | 0.5 | 0 | 0 | 0.5 | 0.5 | 0 | 0.50 | 0.75 | 0.50 | 0.30 | 0.90 | 0.30 | 0.85 |
Pegasus | 0 | 0 | 0 | 0 | 1.0 | 0.5 | 0.50 | 0.75 | 0.50 | 0.30 | 0.90 | 0.30 | 1.05 |
Apache Airflow | 0 | 1.0 | 0 | 0 | 1.0 | 0.5 | 0.50 | 0.75 | 0.50 | 0.30 | 0.90 | 0.30 | 1.80 |
Galaxy | 1.0 | 1.0 | 1.0 | 0 | 1.0 | 0 | 0.50 | 0.75 | 0.50 | 0.30 | 0.90 | 0.30 | 2.65 |
FDE-SWFs | 1.0 | 1.0 | 1.0 | 0.5 | 0.5 | 1.0 | 0.50 | 0.75 | 0.50 | 0.30 | 0.90 | 0.30 | 2.65 |
For comparison, we selected OT, which is the parent tool of FDE-SWFs, Pegasus that is one of the leaders among the traditional WMSs, and Apache Airflow [45] and Galaxy, which represent an actively developing direction of the SBSA support systems. In the context of the considered set of capabilities, the leaders are Galaxy and FDE-SWFs. This is mainly due to the support of service-oriented WFs, the standardization of WF specifications, and the specialization features of these systems. For example, characteristics and have low weights and among end-users of traditional WMSs. However, from the point of view of end-users of FDE-SWFs in the field of energy systems research, these characteristics become more important.
PRACTICAL USE
Currently, FDE-SWFs are successfully used in the development and usage of a number of applications to solve problems in the field of studying the properties and processes of functioning critical energy infrastructures. In particular, it is used to prepare and conduct large-scale experiments to solve the following problems:
global vulnerability analysis of energy infrastructures [46];
evaluation in reduction in the productivity of energy systems due to a flow of failures of their elements when major external disturbances occur [46];
profiling and subsequent evaluation of the efficiency of algorithms for determining the reliability of energy systems for different configurations [47];
determination of the most suitable algorithms for structural-parametric optimization of energy infrastructure models at different levels of their territorial and sectoral hierarchy through testing and multicriteria selection of the algorithms under study [47].
Distinctive features of FDE-SWFs in the development of WF-based SBSAs compared to known WMSs are:
the use of the In-Memory Data Grid technology to place distributed databases in the RAM of environment nodes in order to significantly speed up data processing and analysis [46];
the creation of testbeds that provide developers with the tools to conduct experiments to evaluate the quality of processed data and the functioning of applied software, as well as to analyze computation results and other features of the applications under development [47];
WF design using special system operators, including operators for data aggregation and disaggregation, multimethod computations, dynamic computation planning, etc. [47].
FDE-SWFs are actively used in the educational process within the disciplines on studying parallel and distributed computing for undergraduate and graduate students of educational organizations.
FRAMEWORK LIMITATIONS
The degree of applicability of WMSs is largely determined by their limitations in constructing and running WFs, supporting collaboration for application developers and end-users, the operational efficiency of the WMSs themselves, etc. FDE-SWFs mitigates a number of limitations that exist in other WMSs. At the same time, there are aspects of the FDE-SWFs use that require further development and research. The following are some of limitations of FDE-SWFs.
Currently, FDE-SWFs does not support the execution of combined WFs consisting of operations implemented by modules and operations realized by services. At the HDCE level, FDE-SWFs interacts with Condor DAGMan. HTCondor is used as LRM. We are developing interaction with other metaschedulers and LRMs, taking into account the WF structure [48].
The scientific applications developed using FDE-SWFs have a limited number of their end-users. Aspects of realizing scalability and reliability of the FDE-SWFs operation in different configurations of HDCE with a significant increase in the number of end-users are under research and development.
In addition, we continue to investigate issues related to the execution of large request flows by WPS services representing WFs of the scientific applications developed within the FDE-SWFs.
CONCLUSIONS
As WF-based scientific applications evolve, there is a compelling need to deliver them as services. Service-oriented WMSs often significantly extend the capabilities of traditional systems for similar purposes. They implement a new architecture that meets modern business paradigms for conducting large-scale interdisciplinary research based on WFs and takes full advantage of information and communication technologies.
In this regard, the paper discusses important aspects of the development and use of SBSAs and the implementation of the above-mentioned architecture in HDCE using FDE-SWFs. The highlighted feature of FDE-SWFs is that it provides a variety of ways to construct and execute WFs. WFs are constructed based on the procedural or non-procedural problem statements. They can be implemented based on the composition of WSDL services, in the form of a separate WPS service, which is implemented by a traditional WF based on executable modules and jobs for external metaschedulers and LRMs, as well as using the standardized declarative language BPEL. Accordingly, FDE-SWFs implements three schemes for developing and executing WFs. In addition, it is shown that FDE-SWFs provides a number of important additional capabilities compared to well-known service-oriented WMSs. These include the use of the In-Memory Data Grid technology, the creation of testbed, and the design of WFs using special system operators. Currently, FDE-SWFs are being successfully used in the development of a number of applications for solving problems of analyzing the performance and vulnerability of energy infrastructures and for studying the efficiency of algorithms for structural-parametric optimization of such infrastructures.
Future research will focus on increasing the degree of automation in the integration and containerization of system and application software to further improve the efficiency of deployment and execution of applications in HDCE, and to optimize the use of computational resources. In addition, methods and tools will be developed to ensure the reproducibility of computational experiments related to the execution of WFs.
FUNDING
The study was supported by the Ministry of Science and Higher Education of the Russian Federation, project no. FWEW-2021-0005 “Technologies for the development and analysis of subject-oriented intelligent group control systems in non-deterministic distributed environments” (reg. no. 121032400051-9).
CONFLICT OF INTEREST
The authors of this work declare that they have no conflicts of interest.
Publisher’s Note.
Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
AI tools may have been used in the translation or editing of this article.
REFERENCES
1 Erwin, D.W.; Snelling, D.F. UNICORE: A grid computing environment. Lect. Notes Comput. Sci.; 2001; 2150, pp. 825-834. [DOI: https://dx.doi.org/10.1007/3-540-44681-8_116]
2 Litzkow, M.J., Livny, M., and Mutka, M.W. Condor–A hunter of idle workstations, Proc. 8th Int. Conf. on Distributed Computing Systems, Institute of Electrical and Electronics Engineers, San Jose, 1988, pp. 104–111. https://doi.org/10.1109/DCS.1988.12507.
3 Deelman, E.; Vahi, K.; Juve, G. Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst.; 2015; 46, pp. 17-35. [DOI: https://dx.doi.org/10.1016/j.future.2014.10.008]
4 Talia, D., Workflow systems for science: Concepts and tools, ISRN Software Eng., 2013, no. 1, p. 404525. https://doi.org/10.1155/2013/404525
5 Da Silva, R.F.; Filgueira, R.; Pietri, I.; Jiang, M.; Sakellariou, R.; Deelman, E. A characterization of workflow management systems for extreme-scale applications. Future Gener. Comput. Syst.; 2017; 75, pp. 228-238. [DOI: https://dx.doi.org/10.1016/j.future.2017.02.026]
6 Brown, A.; Johnston, S.; Kelly, K. Using Service-Oriented Architecture and Component-Based Development to Build Web Service Applications; 2002;
7 Afgan, E.; Baker, D.; Coraor, N.; Chapman, B.; Nekrutenko, A.; Taylor, J. Galaxy CloudMan: Delivering cloud compute clusters. BMC Bioinf.; 2010; 11, pp. 1-6. [DOI: https://dx.doi.org/10.1186/1471-2105-11-S12-S4]
8 Balis, B. HyperFlow: A model of computation, programming approach and enactment engine for complex distributed workflows. Future Gener. Comput. Syst.; 2016; 55, pp. 147-162. [DOI: https://dx.doi.org/10.1016/j.future.2015.08.015]
9 Hilman, M.H.; Rodriguez, M.A.; Buyya, R. Knowledge Management in Development of Data-Intensive Systems; 2021; Boca Raton, FL, CRC Press:
10 Papazoglou, M. Web Services: Principles and Technology; 2008; New York, Pearson Education:
11 Welke, R.; Hirschheim, R.; Schwarz, A. Service-oriented architecture maturity. Computer; 2011; 44, pp. 61-67. [DOI: https://dx.doi.org/10.1109/MC.2011.56]
12 Tsalgatidou, A.; Pilioura, T. An overview of standards and related technology in web services. Distrib. Parallel Databases; 2002; 12, pp. 135-162. [DOI: https://dx.doi.org/10.1023/A:1016599017660]
13 Ananthakrishnan, R.; Chard, K.; Foster, I.; Tuecke, S. Globus platform-as-a-service for collaborative science applications, Concurr. Comput.-. Pract. E; 2015; 27, pp. 290-305. [DOI: https://dx.doi.org/10.1002/cpe.3262]
14 Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Comput.; 2011; 15, pp. 70-73. [DOI: https://dx.doi.org/10.1109/MIC.2011.64]
15 Foster, I. Globus toolkit version 4: Software for service-oriented systems. J. Comput. Sci. Technol.; 2006; 21, pp. 513-520. [DOI: https://dx.doi.org/10.1007/s11390-006-0513-y]
16 Foster, I. And Kesselman, C., The Grid: Blueprint for a New Computing Infrastructure; 2002;
17 Foster, I. And Kesselman, C., Globus: A metacomputing infrastructure toolkit. Int. J. Supercomput. Appl.; 1997; 11, pp. 115-128. [DOI: https://dx.doi.org/10.1177/109434209701100205]
18 Juric, M.B.; Chandrasekaran, S.; Frece, A.; Hertis, M.; Srdic, G. WS-BPEL 2.0 for SOA Composite Applications with Oracle SOA Suite 11g; 2010;
19 Juric, M.B.; Mathew, B.; Sarang, P.G. Business Process Execution Language for Web Services: an Architect and Developer’s Guide to Orchestrating Web Services Using BPEL4WS; 2006;
20 Kim, S.J., Foundation for composablemicroservices for rapid synthesis of highly reliable software systems, PhD Thesis, Dallas: Univ. of Texas, 2004.
21 Kim, S., Bastani, F.B., Yen, I.L., and Chen, I.-R., High-assurance synthesis of security services from basic microservices, Proc. 14th IEEE Int. Symp. on Software Reliability Engineering (ISSRE 2003), Denver, 2003, pp. 154–165. https://doi.org/10.1109/ISSRE.2003.1251039.
22 Fielding, R.T., Architectural styles and the design of network-based software architectures, Ph. D. Thesis, Irvine: Univ. of California, 2000.
23 Rajasekar, A., iRODS Primer: Integrated Rule-Oriented Data System, Morgan & Claypool Publ., 2010.
24 Sukhoroslov, O. Building web-based services for practical exercises in parallel and distributed computing. J. Parallel Distrib. Comput.; 2018; 118, pp. 177-188. [DOI: https://dx.doi.org/10.1016/j.jpdc.2018.02.024]
25 Savchenko, D.I., Radchenko, G.I., and Taipale, O., Microservices validation: Mjolnirr platform case study, Proc. 38th IEEE Int. Convention on Information and Communication Technology, Electronics and Microelectronics Conf. (MIPRO), Opatija, 2015, pp. 235–240. https://doi.org/10.1109/MIPRO.2015.7160271.
26 Smirnov, P.A.; Kovalchuk, S.V.; Boukhanovsky, A.V. Knowledge-based support for complex systems exploration in distributed problem solving environments. Commun. Comput. Inf. Sci.; 2013; 394, pp. 147-161. [DOI: https://dx.doi.org/10.1007/978-3-642-41360-5_12]
27 Knyazkov, K.V., Kovalchuk, S.V., Tchurov, T.N., Maryin, S.V., and Boukhanovsky, A.V., CLAVIRE: e-Science infrastructure for data-driven computing, J. Comput. Sci., J. Comput. Sci.-Neth., 2012, vol. 3, no. 6, pp. 504–510. https://doi.org/10.1016/j.jocs.2012.08.006
28 Puzyrkov, D.V.; Podryga, V.O.; Polyakov, S.V. Cloud service for HPC management: Ideas and appliance. Lobachevskii J. Math.; 2018; 39, pp. 1251-1261.
29 Kudryavtsev, A.O., Koshelev, V.K., Izbyshev, A.O., et al., HPC cloud system design and implementation, Proc. ISP RAS, 2013, vol. 24, pp. 13–34. https://ispranproceedings.elpub.ru/jour/article/download/948/673. Accessed June 17, 2024
30 Sorokin, A.A.; Makogonov, S.V.; Korolev, S.P. The information infrastructure for collective scientific work in the Far East of Russia. Sci. Tech. Inf. Process.; 2017; 44, pp. 302-304. [DOI: https://dx.doi.org/10.3103/S0147688217040153]
31 Korolev, S.P.; Sorokin, A.A.; Verkhoturov, A.L.; Konovalov, A.V.; Shestakov, N.V. Automated information system for instrument-data processing of the regional seismic observation network of FEB RAS. Seism, Instrum.; 2015; 51, pp. 209-218. [DOI: https://dx.doi.org/10.3103/S0747923915030068]
32 Shokin, Y.I.; Fedotov, A.M.; Zhizhimov, O.L. Technologies for designing of distributed information systems to support research. Comput. Technol.; 2015; 20, pp. 251-274.
33 Massel, L.V., Massel, A.G., and Tsybikov, A.R., Agent-service approach to building digital twins, Proc. IEEE Int. Russian Smart Industry Conf. (SmartIndustryCon), Sochi, 2024.
34 Bychkov, I.V., Ruzhnikov, G.M., Paramonov, V.V., Shumilov, A.S., Fedorov, R.K., Levi, K.G., and Demberel, S., Infrastructural approach and geospatial data processing services in the tasks of territorial development management, IOP Conf. Ser.: Earth Environ. Sci., 2018, vol. 190, no. 1, p. 012048. https://doi.org/10.1088/1755-1315/190/1/012048
35 Bychkov, I.V., Ruzhnikov, G.M., Fedorov, R.K., Khmelnov, A.E., and Popova, A.K., Organization of digital monitoring of the Baikal natural territory, IOP Conf. Ser.: Earth Environ. Sci., 2021, vol. 629, no. 1. p. 012067. https://doi.org/10.1088/1755-1315/629/1/012067
36 Bychkov, I.V.; Oparin, G.A.; Feoktistov, A.G.; Bogdanova, V.G.; Pashinin, A.A. Service-oriented multiagent control of distributed computations. Automat. Remote Control; 2015; 76, pp. 2000-2010. [DOI: https://dx.doi.org/10.1134/S0005117915110090]
37 Feoktistov, A.; Gorsky, S.; Kostromin, R.; Fedorov, R.; Bychkov, I. Integration of web processing services with workflow-based scientific applications for solving environmental monitoring problems. ISPRS Int. J. Geo-Inf.; 2022; 11, 8. [DOI: https://dx.doi.org/10.3390/ijgi11010008]
38 Kostromin, R.; Basharina, O.; Feoktistov, A.; Sidorov, I. Microservice-based approach to simulating environmentally-friendly equipment of infrastructure objects taking into account meteorological data. Atmosphere; 2021; 12, 1217. [DOI: https://dx.doi.org/10.3390/atmos12091217]
39 Yoo, T.J. State of the art in business process modeling and execution standard. Adv. Sci. Lett.; 2016; 22, pp. 3650-3653. [DOI: https://dx.doi.org/10.1166/asl.2016.7904]
40 Wohlstadter, E., Tai, S., Mikalsen, T., Diament, J., and Rouvellou, I., A service-oriented middleware for runtime web services interoperability, Proc. IEEE Int. Conf. on Web Services, Chicago, 2006, pp. 1–8. https://doi.org/10.1109/ICWS.2006.13.
41 Web Services Business Process Execution Language Version 2.0. https://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf. Accessed June 17, 2024.
42 Common Workflow Language (CWL). https://www.commonwl.org. Accessed June 17, 2024.
43 Feoktistov, A.; Edelev, A.; Tchernykh, A.; Gorsky, S.; Basharina, O.; Fereferov, E. An approach to implementing high-performance computing for problem solving in workflow-based energy infrastructure resilience studies. Computation; 2023; 11, 243. [DOI: https://dx.doi.org/10.3390/computation11120243]
44 Tchernykh, A.; Bychkov, I.; Feoktistov, A.; Gorsky, S.; Sidorov, I.; Kostromin, R.; Edelev, A.; Zorkalzev, V.; Avetisyan, A. Mitigating uncertainty in developing and applying scientific applications in an integrated computing environment. Program. Comput. Software; 2020; 46, pp. 483-502. [DOI: https://dx.doi.org/10.1134/S036176882008023X]
45 Apache Airflow. https://airflow.apache.org/. Accessed June 17, 2024.
46 Feoktistov, A.G.; Kostromin, R.O.; Voskoboinikov, M.L.; Li-De, D.I. Implementation of computing environment implementation for developing and applying scientific workflows based on containerization. Comput. Technol.; 2023; 28, pp. 151-164. [DOI: https://dx.doi.org/10.25743/ICT.2023.28.6.013]
47 Danilov, G. and Voskoboinikov, M., Testbed-based approach to testing a library for evaluating network reliability algorithms, in Proc. Int. Workshop on Critical Infrastructures in the Digital Worl (IWCI-2024), ESI SB RAS, 2024, pp. 3–4.
48 Alaasam, A.B.; Radchenko, G.; Tchernykh, A. Refactoring the monolith workflow into independent micro-workflows to support stream processing. Program. Comput. Software; 2021; 47, pp. 591-600.
Copyright Springer Nature B.V. Dec 2024