Content area
This paper proposes a new approach to building a virtual enterprise (VE) software infrastructure that offers persistence, concurrent access, coherence and security on a distributed datastore based on the distributed shared-memory paradigm. The platform presented, persistent distributed store (PerDiS), is demonstrated with test applications that show its suitability and adequate performance for the building and construction domain. In particular, the successful adaptation of standard data access interface (SDAI) to PerDiS and a comparison with CORBA are presented as examples of the potential of the distributed shared-memory paradigm for the VE environment. [PUBLICATION ABSTRACT]
*u*n*s*t*u*r*t*u*r*e**t*e*x*t*
Journal of Intelligent Manufacturing 12, 199s'212, 2001 # 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.
Distributed shared memory infrastructure for virtual enterprise in building and construction* F A D I S A N D A K LY, 1 J O AA" O G A R C I A , 2 P A U LO F E R R E I R A 2 and P AT R I C E P O Y E T 1
1CSTB, BP 209, 06904 Sophia-Antipolis, France E-mail: [email protected]; [email protected] 2INESC, Rua Alves Redol 9-6, 1000 Lisboa, Portugal
E-mail: [email protected]; [email protected]
Received December 1999 and accepted April 2000
This paper proposes a new approach to building a virtual enterprise (VE) software infrastructure that offers persistence, concurrent access, coherence and security on a distributed datastore based on the distributed shared-memory paradigm. The platform presented, persistent distributed store (PerDiS), is demonstrated with test applications that show its suitability and adequate performance for the building and construction domain. In particular, the successful adaptation of standard data access interface (SDAI) to PerDiS and a comparison with CORBA are presented as examples of the potential of the distributed shared-memory paradigm for the VE environment.
Keywords: Virtual enterprise, building design, persistent distributed store
1. Introduction The software infrastructure is still one of the major
difo""culties for concurrent engineering development especially in large-scale projects where participants are geographically dispersed and belong to different organisations. To build such an infrastructure, traditional approaches based on remote object invocation like Corba, Microsoft DCOM and Java RMI present different limitations when applications manipulate big amounts of data. Among these limitations, three major ones should be highlighted: (i) performance, (ii) security and (iii) difo""culties in porting legacy applications.
In fact, remote object invocation platforms on a wide area network (WAN) or local area network (LAN) penalize data sharing between the organiza- tions of a virtual enterprise (VE). These software platforms also lack a homogenous security model
deo""ned on data and rer'ecting the complex security relations between partners of a VE. Finally, most of the time, porting existing legacy applications to co- operate in a VE implies their complete re-engineering in order to get decent performance, to make them work concurrently and to integrate security con- straints.
This paper proposes a new approach to building a VE software infrastructure (Sandakly, 1999). This approach generated a platform of persistent, distrib- uted and shared memory called PerDiS.1 In PerDiS, memory is shared between all applications, even located at different sites or running at different times. This shared memory represents the shared store of a VE. Coherent caching of data improves performance and availability. It ensures that applications have a consistent view of data, and frees developers from manually managing objects' location. Using the shared memory paradigm facilitates application porting on top of PerDiS. There is no need to change data structures to make them persistent and/ or distributed. To allow concurrent access to data, the *This work has been partially o""nanced by the EU PerDiS project
(Esprit 22533).
PerDiS platform provides transactions and locking mechanisms for connected environments and a check- out/check-in mechanism for disconnected ones. Locking can be handled transparently by PerDiS or explicitly by the applications.
For security, PerDiS implements a task/role model to manage security attributes and access rights to data. Communication between distant machines can be secured by encrypting messages. Security refers to data and users independently from applications.
The next section outlines the problems of devel- oping software infrastructure for collaborative engineering in the context of VE and how PerDiS solves these problems. After that, the major concepts and implementation aspects of the PerDiS platform are detailed. This is followed by a section describing how the standard data access interface (SDAI) of the STEP ISO-10303 norm (ISO 10303-22, 1996) for product data representation and exchange was adapted to meet the requirements of a VE in the building and construction (B&C) sector. Finally, in the Experi- ments section a performance comparison between PerDiS and the well-known Corba approach is presented.
2. Co-operative engineering and virtual enterprise Today, the quickly changing global open market
pushes companies to react increasingly quickly and to adapt and modify their products. To achieve this goal, suppliers as well as contractors from different companies have to be tightly involved in the design and the production cycles. Co-operative or concurrent engineering (CE) (ISO 10303-22, 1996) (Wilbur, 1994) techniques have been generalized giving birth to a new form of collaborative work applied at company level instead of at personal level. The term VE (Camarinha-Matos, 1999) refers to this kind of companies consortium. A VE can be deo""ned as a temporary alliance of independent organizations that come together to quickly exploit a product manu- facturing opportunity. These organizations have to develop a working environment to manage all or part of their different resources toward the attainment of their common goal. Obviously, common information deo""nition and sharing is the essential problem of the VE (Hardwick, 1996). Actually, partners of a VE usually have different business rules and information infrastructures. Being part of a VE means that a
company has to adapt its information system (or part of it) to the VE common information infrastructure in order to share and exchange project data with other partners.
The main issues of such infrastructure are:
(1) Scalability. (2) Evolution. (3) Ease of use and adaptation.
Regarding scalability, the architecture of a VE infrastructure has to be independent of the number of partners, the projects they work on, and the type and amount of data they manipulate. Furthermore, due to the fast evolution of information technologies (IT), this architecture has to be open and easy to evolve. At the same time it has to be simple to use in order to reduce the cost of adapting the IT infrastructure of each partner to it. The main challenges of developing this kind of infrastructure are:
* Deo""nition of a common data model representing
the information to be exchanged and shared between partners. * Deo""nition of a common sharing software
infrastructure that can ensure data storage, data integrity, and data security. * Adapting the existing IT infrastructure of each
partner in the VE to work with the common data model and the common sharing software infra- structure.
2.1. VE in building and construction The B&C sector is intrinsically distributed (Kalay,
1998). Generally, participants on a construction project belong to several independent companies. More than 80% of these companies are SMEs (with an average of 20 employees). Due to their size, these companies are not able to invest much in modifying their IT infrastructure each time they work on a new project. Several aspects characterize the co-operative engineering in VE in the B&C sector:
(1) Short term co-operation implies simple and fast setup of computer infrastructure for data sharing. Except for very big projects and some special actors like the owner of a building, companies operate for a limited time in a VE specially during the design phase. For instance, architects, structure engineers, HVAC (heating, ventilation and air conditioning)
200 Sandakly et al. engineers and electrical engineers work for a short period on a project just to deliver the plans to be used by the construction companies. These actors stay involved in the project until the end but they become less active than others do. This relatively short-term Cupertino aspect constrains the companies to setup and adapt their computer infrastructure quickly to be able to share data with the other members of a given VE. Sharing data in a VE means being able to pick up data coming from other partners in a variety of formats and being able to exploit them without semantic loss.
(2) Long transactions in the conception cycle and the permanent need of data availability. The conception process is cyclic. Starting from a given version of the project plans, designers add new elements and modify existing ones. Modio""cations made in parallel may be conr'icting. Usually, a reconciliation phase between actors allows them to validate a new version of the plans and to restart a new conception cycle. Conception cycles are relatively long (in the order of weeks) and a user working session on a project can be relatively long (few hours). This working mode stimulates the notion of dis- connected work session. There is no need to keep computer remote connection when the user is working with local copy of the data for a few days. Check-out/ check-in mechanisms seem well adapted to the remote data access. Furthermore, attention must be paid to data availability. When a user is working on part of a project, other users cannot modify it. However, they must be able to access the latest valid version of this part without being blocked by another user's transaction. Another issue raised by long transactions is fault tolerance: users cannot accept to lose a couple of hours of work because of a remote machine's crash, a network breakdown or simply because of inap- propriate locking of remote data.
(3) Large datasets. Another important character- istic of B&C design applications is the large amount of data in a project ( just think about the number of different objects that exist in a building like walls, windows, doors, stairs, electrical components, sani- tary installations . . .).
(4) Data security, ownership and responsibility. The legal responsibility of data is an important issue in VE mainly for engineering aspects related to the human security (o""re, material resistance. . .). Sharing data across open networks (like Internet) must take into account the authentication of received data and its
security against modio""cations. Only authorized persons may access and change pieces of data. Beside the legal responsibility, o""nancial aspects increase the weight of security issues. In fact, the VE model encourages the development of a unique data store for a whole project including o""nancial information. Protecting this data is an essential concern for all partners.
2.2. Traditional approaches to VE software During the last few years, an important effort has been
undertaken in different research projects to deo""ne the software infrastructure of the future VE. Among the signio""cant research efforts in this domain is the NIIIP project (NIIIP, 1996) that aims at developing open industry software protocols that allow manufacturers and their suppliers to effectively interoperate as if they were part of the same enterprise. NIIIP bases its architecture on emerging standards (formal and de facto) like STEP ISO-10303 (Fowler, 1995) for data modelling and Corba (OMG, 1997; Mowbray, 1996) as a middleware for application interoperation. Inr'uenced by the OMG and Corba formalisms, NIIIP views VE activity as a set of services offered by partners with interfaces deo""ned with the IDL language and based on Corba standard services (OMG, 1998) like naming, persistence, security, transaction, etc.
At European level, an equivalent effort has been realized with the VEGA Esprit project that aims to establish an information infrastructure to support the technical and business operations of virtual or extended enterprises. This information infrastructure relies on the corba access to step models (COAST) architecture (Koethe, 1997), which allows applica- tions to access data using an API that extends the SDAI (ISO 10303-22, 1996) of STEP. Like NIIIP, the VEGA COAST platform provides services that can be used by VE partners such as the conversion service allowing the mapping between different data schemas and the Workr'ow service to manage projects. Another effort has been undertaken at the University of Salford (Koethe, 1997). In this work, a three-tier architecture is deo""ned based on Corba services and ObjectStore OODBS for data persistence. The ISO STEP modeling language is used to deo""ne the data model. Applications can exchange data using o""les or can
Distributed shared memory 201 share o""ne-grain objects whose interfaces are deo""ned in IDL.
While ISO STEP seems to offer a modeling language and methodology, as well as data models that are largely accepted and used in different manufacturing sectors (aerospace, building and construction, electronics . . .), the Corba approach presents, from a practical point of view, some difo""culties to building VE software infrastructure. Those difo""culties can be summarized as follows:
* Currently, few of the commercial object request
brokers (ORB) implement all services needed for a real VE like security, transaction, con- currency control and persistence. * Corba is based on remote method invocation.
With this approach, objects used by an applica- tion reside in a remote host and can only be accessed via their functional interface. This is a disadvantage when objects are frequently accessed (which is the case in most design tools, e.g., CAD) because an important amount of processing time is wasted in communications. This increases non-useful network trafo""c con- siderably. * Most of the Corba services like concurrent
access, persistence and security are deo""ned at object level. This means that important problems like concurrent data access, security and data distribution (which is a major issue in application performance because of the remote method invocation mechanism) have to be solved in the early phases of an application's design. * VE infrastructure has to integrate existing
applications from different partners. Those applications have to access common data stores and thus have to be interfaced using Corba mechanisms. These require a deep modio""cation in the application's data structure (the inheritance graph has to be changed to access some Corba services) and sometimes a new code structure in order to provide their functionality as a service.
3. The PerDiS approach The PerDiS platform has been developed to overcome
many limitations of traditional approaches in term of performance, security mechanisms, distribution cap-
abilities and ease of use. The use of the distributed shared-memory paradigm as the base for PerDiS implementation improves the performance compar- ison to remote invocation approach notably. Furthermore, it facilitates porting of existing applica- tions without major modio""cations. Additionally, transactions can be transparent to applications. PerDiS offers a default behavior where data locking is done implicitly depending on applications' accessmode to each datum on the store. This implicit behavior simplio""es the extension of a single-user application to a concurrent application where several users can share the same data since PerDiS guarantees data integrity with its locking and transactional mechanisms.
In comparison to other object distribution approaches based on remote invocation like Corba, Microsoft DCOM and Java RMI, which impose small grain distribution, PerDiS allows applications to choose their own distribution granularity based on clusters (which is a set of objects of variable size). As said before, B&C applications, like CAD systems, manipulate big amounts of data. Applications using remote call to access to each object attribute spend most of the execution time in network communica- tions. With PerDiS the whole object (more precisely, a bunch of objects) is transferred once from the remote site and mapped into the memory of the local application. The network cost is widely reduced because all object accesses are done in the application memory.
Regarding security, PerDiS deo""nes security attri- butes on the datastore making them independent from different applications. All these aspects will be detailed in the rest of this section to show how they offer coherent data sharing, persistence and security and how they reduce time and programming effort when porting applications on top of PerDiS.
3.1. PerDiS architecture A PerDiS system (Shapiro, 1997) consists of a set of
machines running two kinds of processes: application processes and PerDiS daemons (PD). There is a PD on each machine. Applications communicate with their local PD through a user level library (ULL) which they access via the PerDiS API. The ULL deals with application-level memory mapping, data transforma- tions and management of objects, locks and transactions. When the application, through the
202 Sandakly et al. application programing interface (API) and the ULL, requests locks or accesses data, it makes requests to the local PD, which deals with caching, issuing locks and data, storage, transactions, security and com- munication with remote machines. A typical cono""g- uration is shown in Fig. 1. Note that application processes are optional. In fact, the PerDiS architecture is a symmetrical client-server architecture in which each application is a client of the local server and all servers interact in a peer-to-peer mode. A machine with just a PD running behaves as a pure server.
3.2. Objects and clusters Objects in PerDiS are sequences of bytes representing
some data structure. They are not limited to e.g., C G* G* (Stroustrup, 1986) objects. An application programmer allocates objects in a cluster, which is a physical grouping of logically related objects. Clusters have a variable (unlimited) size. In contrast with current technology like Corba (OMG, 1997), clusters are the user-visible unit of naming, storage, distribution and security, allowing efo""cient and large- scale data sharing applications. In fact, a cluster combines the properties of a heap ( programs allocate data in it) and of a o""le (it has a name and attributes, and its contents are persistent). Programmers use URLs (Berners-Lee, 1994) to refer to clusters, e.g., pds://perdis.esprit.ec.org/clusters/r'oor1.
An object in some cluster may refer, using standard pointers, to some other object in the same cluster or in another one, even when the current machine does not actually hold the pointed-to object. While navigating through an object structure, an application may implicitly access a previously ``unknown'' cluster,
which may be on a distant machine. For instance, Fig. 2 shows two clusters located at two different machines. Starting from the entry point start in the leftmost cluster, one can navigate through the objects, e.g., start- 4 a- 4 b- 4 print(), implicitly accessing the remote cluster.The same function could have been called by o""rst opening the remote cluster and then calling o""rst- 4 print().
3.3. Persistence Persistence in PerDiS is based on persistence by
reachability (PBR) (Atkinson, 1983): an object is persistent if and only if it is transitively reachable from a persistent root. A persistent root is a distinguishable, named reference, which is persistent by default. To illustrate PBR, reconsider Fig. 2: all objects are reachable from the roots start and o""rst, and thus they are persistent. If the pointer o""rst- 4 c is set to NULL, objects X and Y would no longer be reachable, and would automatically be deleted. Destroying the root objects start and o""rst would delete all objects in both clusters.
The PBR model has two main advantages: it makes persistence transparent and frees programmers from memory management. Persistence is transparent since it is deduced from the reachability property. It frees the programmer from memory management because programmers only allocate memory; deallocation is performed by the system if necessary. This prevents dangling pointers and memory leakage. When porting existing code, only the allocation operator has to be modio""ed to allocate data in persistent memory. There is no need to modify or extend data structures to make them persistent.
Fig. 1. PerDiS architecture.
Distributed shared memory 203
PBR is implemented with a garbage collector (GC). The GC runs as a thread of the PD. It scans the store regularly to eliminate unreachable parts of the persistent store.
3.4. Caching and replication Data distribution in PerDiS is based on lazy
replication and co-operative caching. Lazy replication means that the system only makes copies of data when an application accesses them. Co-operative caching means that caches of different PDs interact to fetch the data an application accesses. Replication and caching avoid the remote access bottleneck because all data access is local. In addition, since replicas are kept in several co-operative caches, data requests are spread over several nodes preventing data access bottlenecks. A potential drawback of sharing implicitly sized units of data is false sharing. False sharing occurs when two applications access different data items that happen to be stored in the same unit of locking. Each application has to wait until the other has unlocked the data before it can access it, although there is no real sharing.
Another important aspect of caching implemented in PerDiS is the prefetching mechanism. In fact, PerDiS allows an application to specify a prefetching strategy depending on its behavior. For instance, the application can deo""ne a set of clusters that the cache prefetches when a given cluster is opened by the application. This mechanism enhances data avail- ability specially when it is located on a remote machine with a slow network connection.
3.5. Transactions The purpose of transactions is to guarantee fault
tolerance and concurrency control to PerDiS applica- tions.
PerDiS provides transactions with the usual ACID
semantics (atomicity, consistency, isolation, and durability) while supporting both optimistic and pessimistic concurrency controls. Pessimistic con- currency control enforces locking when data is accessed. Locked data is unlocked when the transac- tion is committed. Optimistic concurrency control allows users to access data concurrently without locking. Conr'icts between optimistic transactions are detected through data versioning at commit time. Data is marked with a version stamp every time it is written.
Furthermore, the PerDiS model allows for non- serializable views of data ( private copies), for notio""cations and for reconciliation transactions. The broad range of these transactional facilities is motivated by the application domain of co-operative engineering. Interactive project development applica- tions (e.g., CAD) may issue long transactions which are unlikely to involve write conr'icts but which, due to their length and complexity, users will want to avoid aborting at all costs.
Transactions using implicit locking automatically lock data accessed by the application. Data read by an application is protected by a read lock, whereas data that is to be modio""ed is protected by a write lock. Modifying data that is already protected by a read lock causes an upgrade of the lock from read to write. Implicit locking is mainly provided to make porting existing applications to PerDiS easy. However, it reduces concurrency, because it is done at memory page granularity, and increases the risk of deadlocks. Therefore, when developing new applications, explicit locking is preferred. Explicit locking transac- tions do not use automatic locking, but rely on explicit object intent requests, using functions like lock(), and unlock().
When starting a transaction, a programmer may indicate that the application must be notio""ed when the involved data is accessed, modio""ed or committed by other transactions. This allows the development of reactive applications. Applications can, for instance, react to these types of events by updating their cached data using a ``refresh'' function and initiating a new transaction in order to reconcile with concurrent users.
All these functionalities are further extended in PerDiS through its two-level architecture. In PerDiS, there is a distinction between LAN and WAN. For each LAN, there is a node responsible for interacting with PerDiS servers on remote networks. This site, called a gateway, includes a o""le cache and provides
Fig. 2. Clusters, objects and remote references. 204 Sandakly et al. any o""les from machines outside the LAN to local area PerDiS applications. Each PerDiS server manages a multi-version store where a sequence of versions for each o""le is kept as they are submitted for commit by PerDiS applications. Executing transactions on a WAN using PerDiS does not require that all servers involved are synchronized. PerDiS uses a transaction commit algorithm derived from MVGV (Agrawal, 1987) that synchronizes only involved servers at commit time. Furthermore, this algorithm allows the o""le cache to give coherent views of data to applications without having any knowledge of transactions. The PerDiS transactional protocol extends notio""cations and reconciliation functionalities to wide area transactions.
3.6. Security Protecting data in VEs is important for two reasons.
First, project data often represents a large asset due to the amount of work needed to create it. Losing data means losing money. Second, data often represents knowledge and provides companies with a competi- tive edge. Due to the nature of a VE, partners co- operating in one project may be competing in another. Therefore, PerDiS provides security means (Coulouris, 1997) to protect data in a collaborative environment. Security in PerDiS consists of two parts: data access control and secure communication. Data access is controlled by groupware-oriented access rights based on users' tasks and roles. PerDiS clusters are assigned to a particular task and one can specify access rights for a user having a specio""c role in a specio""c task. Access rights can be assigned on a cluster basis to reduce management overheads. When using a secure PerDiS application, users are associated to their role in the task by logging on to a PerDiS security tool.
Secure communication uses public key schemes for signed data access requests, shared keys for encryp- tion and a combination of the two for authentication of message originators.
4. Distributing STEP SDAI To open the PerDiS platform to the ISO standard for
the exchange of product model data (STEP) form- alism and to allow its integration in the building and construction domain, the authors ported an imple-
mentation of the SDAI (ISO 10303-22, 1996) on top of PerDiS. This interface was deo""ned for single user applications accessing a single local or remote data store. This section details how to proceed in order to extend SDAI to make it deal with data distribution, security and multi-users concurrent access to data stores. The main goal of this work is to facilitate the development of STEP concurrent applications. The size of the SDAI layer is about 50,000 lines of C G* G* code. This code has been ported to PerDiS in a very short time comparing to the effort of its development.
The rest of this section describes STEP and SDAI shortly and is followed by a discussion of the issues involved in developing a distributed SDAI application using PerDiS.
4.1. The international standard for the exchange of product model data
STEP provides a basis for communicating product
information at all stages of a product's life cycle. The keyword of data exchange in STEP is data model sharing. In fact, STEP deo""nes tools like the EXPRESS language (Atkinson, 1983) to develop data models that can be used in different applications allowing interoperability and common data structures for data sharing. EXPRESS is an OO-like language providing mechanisms to model constraints on data like global rules and the uniqueness of object values. STEP provides also a deo""nition for data exchange (ISO 10303-21, 1994), which is an ASCII format that can be used to exchange data deo""ned with EXPRESS. This o""le-based format is called STEP physical o""le format (SPF). Finally, for data storage and access, STEP specio""es an API called SDAI (ISO 10303-22, 1996) (STEP data access interface) that deo""nes the way applications can store and retrieve instances in databases. The goal is that applications sharing the same data model can share databases. However, SDAI does not deal with concurrent access of data or with distribution of databases. Data security is not deo""ned in the SDAI either.
4.2. Architecture of the SDAI The SDAI is deo""ned in four different schemas written
in EXPRESS (see Fig. 3):
* The SDAI dictionary schema includes deo""ni-
tions of entities needed to represent a meta-
Distributed shared memory 205
model of an EXPRESS schema. Instances of these entities that correspond to a given schema constitute the SDAI Data Dictionary. * The SDAI session schema includes the deo""ni-
tion of entities needed to store the current state of a SDAI session started by an application. The information stored is mainly the list of reposi- tories opened by the application and their access mode, current transactions, events and errors. * The SDAI population schema deo""nes the
organization structure of a SDAI population. A SDAI population is the set of instances stored in SDAI repositories. Three main entities to store instances are deo""ned in this schema (see Fig. 4):
(1) SDAI model is a grouping mechanism con- sisting of a set of related entity instances based upon one schema,
(2) Schema instance is a logical collection of SDAI Models based upon one schema. A schema instance is used as a domain for EXPRESS global rules validation, as a domain over which references
between entity instances (in different models) are supported or as a domain for EXPRESS uniqueness validation.
(3) Entity extent groups all instances of an EXPRESS entity data type that exist in a SDAI Model.
* The SDAI parameter data schema describes in
abstract terms the various types of instances that are passed and manipulated through the API. It provides deo""nitions for EXPRESS simple types, EXPRESS entity instance, EXPRESS entity attribute value, aggregations and iterators, etc.
Those schemas are independent of any implementa- tion language. The SDAI language bindings (SDAI Implementations) are specio""ed for computing lan- guages like C, C G* G*, Java and IDL.
4.3. Porting SDAI on the PerDiS platform This section discusses the main issues in developing a
persistent distributed version of a SDAI, which allows concurrent data access. Some of the design decisions made have been inr'uenced by one of PerDiS' principal goals which is the fast porting of existing applications that were neither developed to work in a concurrent environment nor in a transactional way. To do this the PerDiS team preserved as far as it was possible the SDAI API described in (ISO 10303-23, 1997) by introducing implicit concurrent and transac- tional behavior based on some of PerDiS' features. However, SDAI was extended with a specio""c distribution and concurrency API that can be used by new applications.
4.3.1. SDAI persistence and storage structure All instances created in the SDAI are made persistent
in PerDiS. This is simplio""ed by the fact that in PerDiS no difference exists between persistent and transient object manipulation.
In a SDAI implementation, application instances are stored in a logical structure. These structures are placed in the PerDiS persistent distributed store (i.e., in clusters). The main object collector in the SDAI is the model. From the application point of view, a model is a set of related instances. Another important collector is the repository. A repository is a collection of models. Those two objects are part of the application data organization and have to be persis- tent. Schema Instances are also logical collectors that
Fig. 3. SDAI architecture. Fig. 4. SDAI storage structure.
206 Sandakly et al. contain several models related to the same schema. As shown in Fig. 5, all these objects are mapped to PerDiS clusters that are a logical collection of application objects.
4.3.2. Distribution granularity In the PerDiS approach, the distribution granularity as
seen by the application is a cluster. Physical distribution is hidden and the PerDiS platform can manage this granularity automatically or semi- automatically to optimize performance using pre- fetching mechanisms (see Section 3.4). From the SDAI point of view, the smallest collector structure is the model. Implementing SDAI models with clusters gives applications o""ne-grain distribution granularity. This granularity is adjustable because models can contain any number and kind of application instances. Inside a cluster, instances can be grouped using entity extent objects.
4.3.3. Concurrent data access There is no concurrent access specio""cation in the
current deo""nition of the SDAI. As shown in Fig. 5, the SDAI data are divided into two categories: Dictionary data and application data. Normally, the dictionary data is read by all applications and is not modio""ed often. The default access to the dictionary is implemented with the local-copy locking mode.
This mean that the application accesses the last valid version, but modio""cations to this data are not committed except when the applications create a new EXPRESS Schema (which means adding a new data model to the dictionary). In this case, the locking mode is upgraded to reads'write and the application can be blocked if another one modio""es the dictionary at the same time.
Application data category represents the project data that users modify most frequently. Usually, the project data is organized in different SDAI models. This organization rer'ects the project decomposition in term of tasks and users. For instance, architecture elements (like walls, openings. . .) can be stored in several SDAI models. Each one of these models may correspond to a part of the building (i.e., a r'oor) under the responsibility of an architect (i.e., a user). Each user needs to access its own data (its part of the project) to modify it. On the other hand, he may occasionally need to access some other parts for viewing or comparison needs (the last valid version is needed). This pattern of work guided this implementa- tion of the default concurrent access to the application data. By default, SDAI models (i.e. PerDiS clusters) are locked in read mode. PerDiS automatically upgrades this mode to reads'write when the applica- tion modio""es the data. When the user needs to access parts of the project belonging to other users,
Fig. 5. Persistent distributed SDAI implementation in PerDiS. Boxes represent PerDiS clusters.
Distributed shared memory 207 applications can open SDAI models with local-copy mode to avoid being blocked when someone else is modifying the model. Data locked in local-copy mode can have its locks upgraded if applications need more control. This approach simplio""es the porting of single- user STEP applications to make them work concur- rently when they were not designed to do this.
4.3.4. Transactions Once a SDAI session is started, an application can
manipulate instances in stores in a transactional context: the application starts a transaction, locks objects (implicitly or explicitly), accesses its data, and o""nally it can commit or abort the transaction. By default, transactions are mapped to PerDiS pessimistic transactions. The standard API is extended to support optimistic transactions when applications do not want to block others and/or are able to perform reconcilia- tion if conr'icts arise.
4.3.5. Security The security model in PerDiS is deo""ned at two levels:
(1) Security of communications which is trans- parent to the application,
(2) Security of stored data, expressed as access rights given to different users in a VE according to a task-role model (Coulouris, 1997). This model separates the security attributes from the data model deo""nition. It allows the management of legal and data ownership aspects in a VE. No major modio""cations have to be done to port applications that don't deal with security except for the handling of access rights violation that can be done at the highest level of an application.
Access rights are deo""ned using the security managing tools developed in PerDiS. The mapping of the SDAI storage structure to the PerDiS clusters allows us to use those tools with the SDAI implementation without modio""cations. SDAI models become the user security units.
5. Experiments Besides the complete SDAI layer ported on top of
PerDiS, several applications have been developed or ported to PerDiS to manage different aspects of VEs and to test the performance of the platform (like security management tools, Xo""g [a drawing tool for
UNIX], OO7. . .). This section presents two applica- tions; the o""rst is a mapping program that translates architectural data from the ISO AP225 (ISO 10303- 225, 1996) format to the VRML (ISO 14772-1, 1997) format. This program has been also ported on top of a Corba ORB (Orbix) and its performance has been compared to PerDiS. The second application is a document management application that manages sets of o""les distributed over several servers and accesses them transactionally. This application shows the use of PerDiS in disconnected transactional operations with different kinds of data types.
5.1. AP225 to VRML mapping application ISO AP225 (ISO 10303s'225, 1996) is a standard
format for representing building elements and their geometry; it is supported by a number of CAD tools. The application presented here reads this format and translates it into virtual reality modeling language (ISO 14772-1, 1997) (VRML), to allow a virtual visit to a building project through a VRML navigator. This application was chosen because it is relatively simple, yet representative of the main kernel of a CAD tool. The original, stand-alone version is compared with a Corba and a PerDiS version.
The stand-alone version has two modules (see Fig. 6). The reader module parses a SPF (STEP Physical File Format) (ISO 10303-21, 1994) o""le containing a building project, and instantiates the corresponding
Fig. 6. Stand-alone version of the AP225 to VRML application.
208 Sandakly et al. objects in memory. The generator module traverses the object graph to generate a VRML view, according to object geometry ( polygons) and semantics. The object graph contains a hierarchy of high-level objects representing projects, buildings, storeys and stair- cases. A storey contains rooms, walls, openings and r'oors; these are represented by low-level geometric objects such as polyloops, polygons and points.
In the Corba port, the reader module is located in a server, which then retains the graph in memory (see Fig. 7). The generator module is a client that accesses objects remotely at the server. To reduce the porting effort, only o""ve classes were enabled for remote access: four geometric classes (Point, ListOfPoints, PolyLoop, and ListOfPolyLoops) and a class (Ap225SpfFile) that allows the client to load the SPF o""le and to get the list of polyloops to map. The porting task took two days (for only o""ve classes). The code to access objects in the generator module had to be completely rewritten.
In the PerDiS port, the reader module runs as a transaction in one process and stores the graph in a cluster (see Fig. 8). The generator module runs in another process and opens that cluster. The porting task took only one day and all classes were made distributed with no modio""cation of the application architecture. The PerDiS version has the advantage that the object graph is persistent, and it is not necessary to re-parse SPF o""les each time. The VRML views generated are identical to the original ones.
The stand-alone version is approximately 4000 lines of C G* G* , in about 100 classes and 20 o""les. In
the Corba version, only o""ve of the classes were made remotely accessible, but 500 lines needed to be changed. In the PerDiS version, only 100 lines were changed.
Table 1 compares the three versions for various test sets and two different cono""gurations:
* Std-Alone represents the original stand-alone
application. * The 1 machine column represents:
(1) The PerDiS version where the application
accesses data stored on the same machine. (2) The Corba version where the server and the
client run on the same machine.
* The 2 machines column represents:
(1) The PerDiS version where the application
accesses data stored on a different machine. (2) The Corba version where the server and the
client run on two different machines.
Compared to a remote-object system, even a mature industrial product such as Orbix, the PerDiS approach yields much better performance.
Table 2 shows the memory consumption compar- ison. The PerDiS version is almost identical to the stand-alone one, whereas the Corba version consumes an order of magnitude more memory, for reasons that were not clear. This experience cono""rms the intuition that the persistent distributed store paradigm performs better (in both time and space) than an industry- standard remote-invocation system, for data sets and algorithms that are typical of distributed VE design
Fig. 7. Corba implementation of the AP225 to VRML application.
Distributed shared memory 209 applications. It also cono""rms that porting existing code to PerDiS is straightforward and provides the beneo""ts of sharing, distribution and persistence with very little effort.
5.2. Project manager application The second application presented in this paper is a
document management application called project manager. This application aims at allowing applica- tions not programed for PerDiS to manage sets of o""les distributed among several servers and to access them transactionally. Sets of conventional o""les (text o""les, spreadsheets, CAD o""les . . .) which are related and needed as a group in order to perform an activity can be put into a common project. The project manager is coupled with a browser and any o""le at a PerDiS site, which is made visible by the local WWW server, can
be added to a project. The coherence of the view provided over these sets of o""les is guaranteed by transactions performed within the project manager. When a project is created and o""les are inserted into it, these o""les are included in the PerDiS system and therefore from that moment on their coherence, distribution, persistence and concurrency control are ensured by PerDiS. Every time a user wants to change o""les belonging to a project, he (she) starts the project manager and checks out the o""les to a location of his choice (local directory, portable computer, and r'oppy disk . . .). The project manager can then be terminated and only has to be started again when the user wishes to check in the project's o""les. Meanwhile, any external application can be used to view or modify the o""les. When the o""les are checked-in, a transaction is committed which guarantees the ACID properties of the sequence of changes made by the external application.
Fig. 8. PerDiS implementation of the AP225 to VRML application.
Table 1. Execution time comparison between the 3 implementation of the AP225-VRML mapping application: Stand-Alone version, PerDiS version (on 1 and 2 machines) and Corba version (on 1 and 2 machines)
SPF Files Execution time (s) File Size In Kb No. of SPF Objects No. of Polyloops Std-Alone On 1 machine On 2 machines
PerDiS Corba PerDiS Corba 293 5200 530 0.03 1.62 54.52 2.08 59.00 633 12,080 1024 0.06 4.04 115.60 4.27 123.82 725 12,930 1212 0.07 4.04 146.95 5.73 181.96 2031 40,780 4091 0.16 13.90 843.94 271.50 1452.11
210 Sandakly et al. 6. Conclusion Today, the major difo""culty to build a VE consortium is
the lack of a software infrastructure that can integrate persistence, distribution, concurrent access control, data integrity, coherence and security. In addition, there is a clear need for an efo""cient interface that allows a fast porting of existing applications to reduce the development effort needed to ensure the inter- operability of different tools from different partners. This paper presented PerDiS, a new approach to develop VE software infrastructures using a persistent distributed store, which is based on lazy replication and a global co-operative coherent cache. This approach reduces the performance problem by avoiding the remote access techniques used by traditional distribution approaches, i.e., based on RPC. Furthermore, PerDiS includes features such as implicit locking and optimistic and pessimistic transactions to allow a simple and fast porting of applications that were not developed to work in a transactional and concurrent environment.
Another major difo""culty of VE is the deo""nition of common data models. For this purpose, the authors proposed the standardization efforts as a solution. ISO STEP techniques and methodologies start to be widely accepted in different manufacturing sectors. To open the PerDiS platform to a wide range of application domains a distributed version of the SDAI to develop concurrent STEP tools was implemented. Finally, the paper presents a performance comparison of a VRML mapper ported to PerDiS and to an industrial implementation of Corba. More details about PerDiS concepts, implementation, performance and applications can be found in Ferreira (2000).
Acknowledgments The authors would like to thank all partners of the
PerDiS Esprit project 22533 who contributed to the design and the development of the platform and the test applications. Members of PerDiS consortium are (in alphabetical order): CSTB (France), IEZ (Germany), INESC (Portugal), INRIA (France) and QMW (Great Britain).
Notes 1. The source code of the PerDiS platform is freely
available at http://www.perdis.esprit.ec.org/
References Agrawal, D., Bernstein, A. J., Gupta, P. and Sengupta, S.
(1987) Distributed optimistic concurrency control with reduced rollback. Distributed Computing, 2, 45s'59. Atkinson, M. P., Bailey, P. J., Chisholm, K. J., Cockshott, P.
W. and Morrison, R. (1983) An approach to persistent programming. The Computer Journal, 26(4), 360s'365. Berners-Lee, T., Masinter, L. and McCahill, M. (December
1994) Uniform Resource Locators. RFC 1738. Camarinha-Matos, L. M. and Afsarmanesh, H. (1999) The
virtual enterprise concept. Infrastructure for Virtual Enterprise, Kluwer Academic Publishers, Boston, USA. Coulouris, G., Dollimore, J. and Roberts, M. (1997) Security
Services Design. PerDiS deliverable PDS-R-97-008. http://www.perdis.esprit.ec.org/deliverables/docs/ T.D.1.1/A/ T.D.1.1-A.html. Faraj, L., Alshawi, M., Aouad, G., Child, T. and Underwood,
J. (1999) Distributed object environment: using
Table 2. Memory occupation comparison
SPF Files Memory Occupation (Kb) File Size (Kb) No.of SPF Objects No.of Polyloops Std-Alone PerDiS Corba
In Memory Persistent 293 5200 530 2269 2073 710 26,671 633 12,080 1024 2874 2401 1469 51,054 725 12,930 1212 3087 2504 1759 59,185
Distributed shared memory 211
international standards for data exchange in the construction industry. Computer-Aided Civil and Infrastructure Engineering, 14, Blackwell Publishers, New Jersey, USA. Ferreira, P., Shapiro, M., Blondel, X., Fambon, O., Garcia,
J., Kloosterman, S., Richer, N., Roberts, M., Sandakly, F., Coulouris, G., Dollimore, J., Guedes, P., Hagimont, D. and Krakoviak, S. (February 2000) PerDiS: Design, implementation and use of a PERsistent Distributed Store. Recent Advances in Distributed Systems, Krakowiak, S., Shrivastava, S. K., Lecture Notes in Computer Science, 1752, Springer Verlag, Heidelberg, Germany. Fohn, S. M., Greef, A., Young, R. E. and O'Grady. P. (1995)
Concurrent engineering. Lecture Notes in Computer Science, 973. Springer Verlag, Heidelberg, Germany. Fowler, J. (1995) STEP for data management, exchange and
sharing. Technology Appraisals, Great Britain. Hardwick, M., Spooner, D. L., Rondo, T. and Morris, K. C.
(February 1996) Sharing manufacturing information in virtual enterprise. Communications of the ACM, 39, 2. ISO 10303Z-11 (1994) Industrial automation system and
integration-Product data representation and exchangeDHPart 11. Description methods: The EXPRESS language reference manual. ISO 10303-21 (1994) Industrial automation system and
integration-Product data representation and exchangeDHPart 21. Implementation methods: Clear Text Encoding of the Exchange Structure. ISO 10303-22 (1996) Industrial automation system and
integration-Product data representation and exchangeDHPart 22. Implementation methods: Standard Data Access Interface specio""cation, 1996. ISO 10303-225 (1996) Industrial automation system and
integration-Product data representation and exchangeDHPart 225. Application Protocol: Building Elements Using Explicit Shape Representation. ISO 10303-23 (January 1997) Industrial automation system
and integration-Product data representation and exchangeDHPart 23. C G* G* programming language
binding to the standard data access interface. ISO TC184/SC4/WG11 N004. ISO 14772-1 (1997) The Virtual Reality Modelling
Language. Kalay, Y. E. (1998) Computational environment to support
design collaboration. Automation in Construction, 8, Elsevier Science, Amsterdam, Netherlands. KoE`the, M. (January 1997) COST Architecture: the Corba
Access to STEP Information StorageDHArchitecture and Specio""cation. Deliverable D301 of ESPRIT 20408 VEGA project. Mowbray, T. J. and Zahavi, R. (1996) The Essential CORBA
DHSystem Integration Using Distributed Objects, John Wiley and Sons, New York, USA. Object Management Group (December 1998) Corba
Services Specio""cation. http://www.omg.org/corba/ sectran1.html. Object Management Group (September 1997) The Common
Object Request Broker Architecture and Specio""cation (CORBA) Revision 2.1. http://www.omg.org/corba/ corbaiiop.htm. Sandakly, F., Garcia. J., Ferreira. P. and Poyet. P. (1999)
PerDiS: an infrastructure for cooperative engineering in virtual enterprises. Infrastructures for Virtual Enterprises, Networking Virtual Enterprises, Kluwer Academic Publishers, Boston, USA. Shapiro, M. (1997) The PerDiS Architecture. Deliverable
PDS-R-97-002 of ESPRIT 22533 PerDiS project. http://www.perdis.esprit.ec.org/deliverables/docs/ architecture. Stroustrup, B. (1986) The C G* G* Programming Language,
Addison-Wesley, New York, USA. The National Industrial Information Infrastructure Protocols
(NIIIP) Consortium (1996) The NIIIP Reference Architecture (Revision 6). http://www.niiip.org/ public-forum/NTR96-01/NTR96-01-HTML-PS/ niplongd.html. Wilbur, S. (June 1994) Computer support for co-operative
teams: Applications in concurrent engineering. IEEE Colloquium on Current Development in Concurrent Engineering Methodologies and Tools.
212 Sandakly et al.
Copyright Kluwer Academic Publishers Apr 2001