Content area
We present a formal semantics for an object-oriented specification language. The formal semantics is presented as a conservative shallow embedding in Isabelle/hol and the language is oriented towards ocl formulae in the context of uml class diagrams. On this basis, we formally derive several equational and tableaux calculi, which form the basis of an integrated proof environment including automatic proof support and support for the analysis of this type of specifications. We show applications of our proof environment to data refinement based on an adapted standard refinement notion. Thus, we provide an integrated formal method for refinement-based object-oriented development. [PUBLICATION ABSTRACT]
Acta Informatica (2009) 46:255284
DOI 10.1007/s00236-009-0093-8
ORIGINAL ARTICLE
Achim D. Brucker Burkhart Wolff
Received: 6 January 2004 / Accepted: 16 February 2009 / Published online: 25 March 2009 Springer-Verlag 2009
Abstract We present a formal semantics for an object-oriented specication language. The formal semantics is presented as a conservative shallow embedding in Isabelle/hol and the language is oriented towards ocl formulae in the context of uml class diagrams. On this basis, we formally derive several equational and tableaux calculi, which form the basis of an integrated proof environment including automatic proof support and support for the analysis of this type of specications. We show applications of our proof environment to data renement based on an adapted standard renement notion. Thus, we provide an integrated formal method for renement-based object-oriented development.
1 Introduction
The Unied Modeling Language (uml) [29] has been widely accepted throughout the software industry for modeling object-oriented software systems and is successfully applied to diverse domains [19]. uml is supported by major Computer Aided Software Engineering (case) tools and integrated into an object-oriented software development process model that stood the test of time. The Object Constraint Language (ocl) [28] is a textual extension of the core uml that allows for constraining uml models.
In research communities, uml/ocl has attracted interest for various reasons:
1. it is a formalism with a programming language face, which is perhaps easier to adopt by software developers notoriously hostile to mathematical notation,
A. D. Brucker (B)
SAP Research, Vincenz-Priessnitz-Str. 1, 76131 Karlsruhe, Germany e-mail: [email protected]: http://www.brucker.ch/
B. Wolff
Universit Paris-Sud, Parc Club, 4, Rue Jaques Monod, 91893 Orsay Cedex, France e-mail: [email protected]: http://www.lri.fr/~wolff/
Semantics, calculi, and analysis for object-oriented specications
123
256 A. D. Brucker, B. Wolff
2. it puts forward the idea of an object-oriented specication formalism, turning objects and inheritance into the center of the modeling technique, and
3. it provides in many respects a core language for object-oriented modeling which makes it a good target for research of object-oriented semantics.
Here, item 1 refers not only to the concrete syntax (which we will largely ignore throughout this paper), but also to semantics: ocl semantics comprises the notion of undenedness to model exceptional computations abstractly; this concept is deeply integrated into the logic and presents a particular challenge to deductive systems. Further, especially item 2 makes ocl rather different from logical languages such as rst-order logic, higher-order logic, set theory and derived specication formalisms such as Z [35] or vdm [16]. Following a long platonic tradition in logic, these languages have a foundation in the notion of values and the definition of (hierarchies of) relations over them. In contrast, ocl allows for specifying constraints on the state consisting of object instances linked via references, i.e., its object graphs, and the transition relation over this state. This remarkably different perspective makes semantics for object-oriented specications difcult. Comparing ocl with the two related approaches jml and Spec#, the main difference is that ocl attempts to abstract from concrete object-oriented programming languages, while jml and Spec# are designed as annotation-languages for a particular one.
In this paper, we present hol-ocl, which is a language, based on the uml/ocl standard, as well as a system and a methodology. We will use hol-ocl to illustrate our contributions, which we divide into the following categories:
1. We present a machine-checked formal semantics for object-oriented data models (e.g., class systems as present in the uml) comprising subtyping, single-inheritance, casting, dynamic and static types.
2. We present a formal, machine-checked semantics for uml and its assertion language ocl; Since both semantics are presented as a type-safe, shallow embedding, they are particularly suited as a basis for a proof environment.
3. Based on this semantic embedding, we derive several proof calculi consisting of derived rules constructed by machine-checked proofs, in particular equational and tableaux calculi.
4. We develop specic proof automation, both on the level of the data models, and on the level of the derived rules.
5. We present a method to analyze object-oriented data-models and combine it with a forward renement method [27,35]; thus, high-level specications can be rened to executable ones.
Except from background presentation in Sect. 2, the plan of the paper follows the structure of our contribution list: Sect. 3 is devoted to the conservative embedding of the semantics assertions on them, Sect. 4 to the derivation of the proof calculi and automated proof-support, and Sect. 5 is concerned with the methodology of specication analysis and renement.
2 Background
2.1 A guided tour through uml/ocl
The Unied Modeling Language (uml) comprises a variety of model types for describing static (e.g., class models, object models) and dynamic (e.g., state-machines, activity graphs) system properties. One of the more prominent model types of the uml is the class model
123
Semantics and proof for object-oriented specications 257
Fig. 1 A simple uml class model representing a conference system for organizing conference sessions: persons can participate, in different roles, in a session
(visualized as class diagram) for modeling the underlying data model of a system in an object-oriented manner. As a running example, we model a part of a conference management system. Such a system usually supports the conference organizing process, e.g., creating a conference Website, reviewing submissions, registering attendees, organizing the different sessions and tracks, and indexing and producing the resulting proceedings. In this example, we constrain ourselves to the process of organizing conference sessions; Fig. 1 shows the class model. We model the hierarchy of roles of our system as a hierarchy of classes (e.g., Hearer, Speaker, or Chair) using an inheritance relation (also called generalization). In particular, inheritance establishes a subtyping relationship, i.e., every Speaker (subclass) is also a Hearer (superclass).
A class does not only describe a set of instances (called objects), i.e., record-like data consisting of attributes such as name of class Session, but also operations dened over them. For example, for the class Session, representing a conference session, we model an operation findRole(p:Person):Role that should return the role of a Person in the context of a specic session; later, we will describe the behavior of this operation in more detail using ocl. In the following, the term object describes a (run-time) instance of a class or one of its subclasses.
Relations between classes (called associations in uml) can be represented in a class diagram by connecting lines, e.g., Participant and Session or Person and Role. Associations may be labeled by a particular constraint called multiplicity, e.g., 0..*or 0..1, which means that in a relation between participants and sessions, each Participant object is associated to at most one Session object, while each Session object may be associated to arbitrarily many Participant objects. Furthermore, associations may be labeled by projection functions like person and role; these implicit function definitions allow for ocl-expressions like self.person, where self is a variable of the class Role. The expression self.person denotes persons being related to the a specic object self of type role. A particular feature of the uml are association classes (Participant in our example) which represent a concrete tuple of the relation within a system state as an object; i.e., associations classes allow also for dening attributes and operations for such tuples. In a class diagram, association classes are represented by a dotted line connecting the class with the association. Associations classes can take part in other associations. Moreover, uml supports also n-ary associations (not shown in our example).
We rene this data model using the Object Constraint Language (ocl) for specifying additional invariants, preconditions and postconditions of operations. For example, we specify that objects of the class Person are uniquely determined by the value of the name attribute and that the attribute name is not equal to the empty string ():
123
258 A. D. Brucker, B. Wolff
context Personinv : name
.=
Person :: allInstances () - > isUnique (p: Person | p. name )
Moreover, we specify that every session has exactly one chair by the following invariant (called onlyOneChair) of the class Session:
context Sessioninv onlyOneChair :self . participants -> one ( p: Participant |isTypeChair (p.role))
where isTypeChair (p.role) evaluates to true, if p.role is of dynamic type Chair. Besides the usual static types (i.e., the types inferred by a static type inference), objects in uml and other object-oriented languages have a second dynamic type concept. This is a consequence of a family of casting functions (written o[C] for an object o into another class type C) that allows for converting the static type of objects along the class hierarchy. The dynamic type of an object can be understood as its initial static type and is unchanged by casts (see Sect. 3 for details). We complete our example by describing the behavior of the operation findRole as follows:
context Session :: findRole ( person : Person ): Rolepre : person self . participates . person
post : result = self . participants -> one (p: Participant |
p. person .
= person ). role
self . participants
.= self . participants@pre
.= self . name@pre
where in post-conditions, the operator @pre allows for accessing the previous state.
In uml, classes can contain attributes of the type of the dening class. Thus, uml can represent (mutually) recursive datatypes. Moreover, ocl introduces also recursively specied operations.
A key idea of dening the semantics of uml and extensions like SecureUML [8] is to translate the diagrammatic uml features into a combination of more elementary features of uml and ocl expressions [13]. For example, associations are usually represented by collection-valued class attributes together with ocl constraints expressing the multiplicity. Thus, having a semantics for a subset of uml and ocl is tantamount for the foundation of the entire method.
2.2 Formal and technical background of hol in Isabelle
2.2.1 The logical framework Isabelle
Isabelle [23] is a logical framework providing a logical core language based on an intuitionistic fragment of higher-order logic (hol). Isabelle is based on a typed version of the -calculus: types are dened as ::= | (, . . . , ), where the set of type variables is ranging
over , , . . ., and where the set of type constructors contains the function space _ _.
Further, we use inx notation; e.g., instead of _ _(1, 2) we write 1 2; multiple
applications like 1 (. . . (n n+1) . . .) are also written as [1, . . . , n] n+1.
self . name
123
Semantics and proof for object-oriented specications 259
The terms of Isabelle are -terms dened as ::= C | V | V. | , where C is the set of
constants, V is the set of variables like x, y, z. Abstractions and applications are written x.e and e e or e(e ). A subset of -terms may be typed, i.e., terms may be associated to types by an inductive type inference system similar to the programming language Haskell. The built-in logical core language comprises a congruence _ _, a meta-implication _ _
and a meta-quantier x.Px. The meta-implication helps to represent logical rules: a Horn-clause A1 . . . An An+1, written [[A1; . . . ; An]] An+1, is viewed as a rule
of the form from assumptions A1 to An, infer conclusion An+1:
A1 . . . An
An+1
[A1]
A2 .
A
(1)
The second rule to the right represents the natural deduction rule if A2 can be inferred from assumption A1, infer A which is represented in Isabelle as a rule of the form (A1
A2) A.
The meta-quantier helps to represent eigenvariables and turns out to be a exible mechanism to represent Skolemization for quantiers; dually, Isabelles term language comprises meta-variables (denoted ?x, ?y, ?z, ) that represent terms to be substituted during proof. For example, universal quantiers are captured by the rules:
x.P(x)
and P(?x)
x.P(x)
.
(2)
The deduction engine of Isabelle is based on higher-order resolution; this means that the meta-variables are substituted during the inferences as needed.
2.2.2 The meta-language hol
Classical higher-order logic (hol) [4,11] is the instance of Isabelle which is mostly used and developed. A few axioms describe the logical core system based on the logical type bool with the logical connectives _, __, __ and _ _ which are constants of type bool bool or [bool, bool] bool. Quantiers are represented by higher-order abstract syntax; this means
that _._ and _._ are usual constants of type ( bool) bool and that terms of the form (x.P x) are written x.P. The Hilbert operator x.P returns an arbitrary x that makes P x
true. Further, there is the logical equality _ = _ of type [, ] bool.
This core language can be extended to large libraries comprising Cartesian product types _ _ with the usual projections fst and snd as well as type sums _ + _, with the injections
Inl and Inr. The set type Set can be introduced isomorphic to the function space bool,
i.e., to characteristic functions, and a typed set theory is introduced with the usual operators, e.g., _ _, _ _, _ _.
The hol type constructor assigns to each type a type lifted by . The function [rightanglesw]_[rightanglese] : denotes the injection, the function [rightanglenw]_[rightanglene] : its inverse for dened values.
Partial functions are just functions over which the usual concepts of domain
dom f and range ran f are dened. Moreover, on each type a test for denedness is
available via def x (x =).
x.P(x)
123
260 A. D. Brucker, B. Wolff
3 A formal semantics of ocl in Isabelle/hol
In this section, we formalize a core subset of uml and ocl. While this presentation of the uml/ocl semantics is largely equivalent to the formal semantics presented in the ofcial ocl standard [28, Appendix A] (which is based on [24]), we use a level of abstraction here that is in between the paper-and-pencil semantics presented in [25,28] and our machine-checked version [6,9]. In particular, our presentation differs in the following details: while [28, Appendix A] uses several interpretation functions I[[_]] mapping syntactic
expressions e of the language and contexts to values in some semantic domain D and
functions over them, we avoid I[[_]] and state these values and functions directly. Tech
nically, we use an implementation technique called shallow embedding [5]. Instead of one untyped semantic domain, we use a type-indexed family D and thus use a type discipline in our meta-language. These types are formal representations of ocl types handled in [28, Appendix A] only partially.
Moreover, our formalization follows the formal semantics presented in the non-normative part of the ocl standard [28, Appendix A] (which, in itself, is based on [25]). The (minor) differences of hol-ocl to the normative part of the ocl standard [28] are discussed in [6].
3.1 Validity and evaluations
The topmost goal of the formal semantics is to dene the validity statement:
(, ) | P, (3) where is the pre-state and the post-state of the underlying system and P is a Boolean expression (a formula). The assertion language of P is composed of1. operators on built-in data structures such as Boolean or set,2. operators of the user-dened data-model such as accessors, type-casts and tests, and3. user-dened, side-effect-free methods.
Informally, a formula P is valid if and only if its evaluation in the context (, ) yields true.
As all types in hol-ocl are extended by the special element denoting undenedness, we
dene formally:
(, ) | P P(, ) = [rightanglesw]true[rightanglese] . (4) Since all operators of the assertion language depend on the context (, ) and result in values that can be , all expressions can be viewed as evaluations from (, ) to a type . Con
sequently, all types of expressions have a form captured by the following type abbreviation
V () := state state , (5) where state stands for the system state and state state describes the pair of pre-state and
post-state and _ := _ denotes the type abbreviation.3.2 Strict operations
Following common terminology, an operation that returns if one of its arguments is is
called strict. The majority of the operations is strict, e.g., the Boolean negation is formally presented as:
I[[ X]] =
[rightanglesw][rightanglenw]I[[X]][rightanglene][rightanglese] if [[X]] = , otherwise .
(6)
123
Semantics and proof for object-oriented specications 261
where = (, ) and I[[_]] is just a notation marking the hol-ocl constructs to be dened.
This notation motivated by the definitions in the ocl standard [29]. In our case, I[[_]] is just
the identity, i.e., I[[X]] X. Moreover, we use syntactic overloading: the _ operator on
the right has type bool bool and refers to the logical negation of hol, while the _
operator on the left has type V (bool) V(bool) and refers to the hol-ocl negation. The
types are different such that confusion is systematically avoided; however, we will use a special high-lighting to improve the readability in this presentation. All these operators can be viewed as transformers on evaluations.
The binary case of the integer addition is analogous:
I[[X + Y ]] =
[rightanglesw][rightanglenw]I[[X]] [rightanglene] + [rightanglenw]I[[Y ]] [rightanglene][rightanglese] if I[[X]] = and I[[Y ]] = , otherwise.
(7)
Here, the operator _+_ on the right refers to the integer hol operation with type [int, int]
int. The type of the corresponding strict hol-ocl operator _ + _ is [V (int), V(int)]
V(int).
A slight variation of this definition scheme is used for the operators on collection types such as hol-ocl sets, sequences or bags:
I[[X Y ]] =
S[rightanglesw][rightanglenw]I[[X]][rightanglene] [rightanglenw]I[[Y ]][rightanglene][rightanglese] if I[[X]] = and I[[Y ]] = , otherwise.
(8)
Here, S (smash) is a function that maps a lifted set [rightanglesw]X[rightanglese] to if and only if X and to the
identity otherwise. Smashedness of collection types is the natural extension of the strictness principle for data structures.
Intuitively, the type expression V () is a representation of the type that corresponds to the hol-ocl type . Thus, we introduce the following type abbreviations:
Boolean := V (bool), Set() := V (set), (9)
Integer := V (int), and Sequence() := V (list). (10) These abbreviations exemplify the fact that the mapping of an expression E of hol-ocl type
T to a hol expression E of hol type T is injective and preserves well-typedness.
3.3 Boolean operators
There is a small number of explicitly stated exceptions from the general rule that hol-ocl operators are strict: the strong equality, the denedness operator and the logical connectives.
As a prerequisite, we dene the logical constants for truth, absurdity and undenedness. In the sequel, however, we omit the semantic bracket notation I[[_]] since it is redundant in
our setting. Thus, instead of I[[ T ]] = [rightanglesw]true[rightanglese], we write these definitions as follows:
T .[rightanglesw]true[rightanglese], F .[rightanglesw]false[rightanglese], and . . (11)
hol-ocl has a strict equality _ .
= _ which is dened similarly to the integer addition.
However, for specication purposes, we introduce also a strong equality _ [defines] _ which is dened as follows:
X [defines] Y .(X = Y ), (12)
123
262 A. D. Brucker, B. Wolff
where the _ = _ operator on the right denotes the logical equality of hol. The undenedness
test is dened by X (X [defines] ). The strong equality can be used to state reduction rules like: | (
.= X) [defines] .
The ocl standard requires a Strong Kleene Logic. In particular, it denes:
X Y .
of type [Boolean, Boolean] Boolean. The other Boolean connectives are dened
as follows:
X Y ( X Y ) X Y X Y (14) The logical quantiers are viewed as special operations on the collection types Set() or
Sequence(). Their definition in the ocl standard is very operational and restricted to the nite case; instead, we dene the universal quantication as generalization of the conjunction:
x X . P(X)
if X =,[rightanglesw]x [rightanglenw]X [rightanglene].[rightanglenw]P(.x)[rightanglene][rightanglese] if x [rightanglenw]X [rightanglene].P(.x) = ,[rightanglesw]false[rightanglese] if x [rightanglenw]X [rightanglene].P(.x) = [rightanglesw]false[rightanglese], otherwise; (15)
and the existential quantication is dened as follows:
x X . P(x)
3.4 An axiomatization of object-oriented data structures
In the previous sections, we described various built-in operations on datatypes and the logic. Now we turn to several families of operations that the user implicitly denes when stating a class model, e.g., described as a uml class diagram, as logical context of a specication. This is the part of the language where object-oriented features such as type casts, accessor functions, and tests for dynamic types play a role, and this is the issue that makes hol-ocl remarkably different from specication formalisms such as vdm or Z.
3.4.1 Class models: the syntax
In the following, we dene the notion of a class model precisely. A class model is a tuple CM = (C, A, M, _ < _) where: C, a set of class names containing at least the name of the common superclass, i.e., OclAny, A, a set of class attributes, a nite partial map M that assigns to each class identier a nite partial map assigning attributes to types: M C (A T ),
[rightanglesw][rightanglenw]X [rightanglene] [rightanglenw]Y [rightanglene][rightanglese] if X = and Y = , [rightanglesw]false[rightanglese] if X = [rightanglesw]false[rightanglese] or Y = [rightanglesw]false[rightanglese], otherwise.
(13)
.
x X . P(x)
. (16)
123
Semantics and proof for object-oriented specications 263
a partial irreexive order _ < _ on class names called class hierarchy; in particular, we assume X <OclAny for all X dom M.
For simplicity, we assume that attribute names are unique throughout this paper; this means that the domains of two elements of the range of M are always pairwise disjoint. For ci < cj , we will say that the class ci is a subclass of class cj ; the restriction that any class is a subclass of OclAny can be imposed without loss of generality. Furthermore, we assume a set of types inductively dened as follows:
T := C | OclAny | Boolean | Integer | Real | String
| Sequence(T ) | Set(T ) | Bag(T ) | T T.
(17)
All classes, referenced by their names, induce an own type, the class type, that represents the type of all objects of this class. Objects are typed pieces of data that contain values for each attribute of the associated class.
3.4.2 Class models: the induced signature
A class model induces several families of functions, which we will axiomatize in the following. This representation is simplied compared to the technique used in hol-ocl. hol-ocl constructs a model for this axiom system for a given class system by compiling it into a sequence of conservative definitions. The presented axioms are then derived from them. Due to space limitations, we will not go into detail and refer the interested reader to [6,10] for details.
The families of functions induced by a class model CM comprise:
1. for each C dom M, and each attribute a dom (M C) of type T , there is an attribute
accessor function _.a as well as an attribute accessor function _.a @pre of type C T ,
2. a C dom M indexed family of overloaded type-casts: _[C] of type X C for all X < C
or C < X,3. a C dom M indexed family of tests for the dynamic type of an object: isTypeC _, and
4. a C dom M indexed family of tests for the dynamic kind of an object: isKindC_.
Ad item 1: Accessor functions always return typed objects, or values, and never references; however, references are used internally in the semantic model of the object construction. The accessor functions return the value of the attribute of a given object if the attribute has type, e.g., Boolean, Integer, Real. If the attribute type has class type, the attribute contains a reference which is referenced in the state or pre-state, yielding again an object. Following the injectivity principle of the representation map, we give an accessor _.a (or: _.a @pre , referencing in the pre-state) of hol-ocl type C T the hol type of an evaluation transformer:
V (C) V (T ).
Ad item 2: Type-casts are the omnipresent glue in expressions of object-oriented languages. They are used to implement subtyping by interfacing objects up and down the class hierarchy to t them in as argument of an operation. Both up and down casts have to be semantically lossless; for example, it must be possible to cast an arbitrary object up to OclAny, include it into a set of Set(OclAny), exclude it later and cast again to the identical object (this is the way generic datatypes are implemented in, e.g., Java).
Ad item 3: Applying type-casts modies the static type of an object, i.e., the type inferred by static type inference. To enable down-casts appropriately, however, it must be possible to reconstruct for an object which type it had originally, i.e., at creation time. The test isTypeC obj holds if and only if the dynamic type of obj is C.
123
264 A. D. Brucker, B. Wolff
Ad item 4: A relaxation of the dynamic type test is the dynamic kind test which holds if and only if the dynamic type of obj is C or a subtype of C.
3.4.3 Class models: an axiomatization
First of all, all functions of the induced signature are strict. This means that the following scheme of rules hold:
.a = [C] = isTypeC = isKindC_ = (18) for all attributes a of a class model CM and all C dom M. These equalities are hol equalities
and both left and right hand sides are evaluations, i.e., functions depending on the context. By extensionality, these equalities express that both sides are congruent for all contexts (see also discussion in Sect. 4).
Furthermore, the following rule schemes express that the dynamic type remains unchanged while casting:
isTypeC obj[C] = isTypeC obj isKindCobj[C ] = isKindCobj (19) for all C, C dom M.
Moreover, we can re-cast an object safely, i.e., up and down casts are idempotent. However, casting an object deeper in the subclass hierarchy than its dynamic type results in undenedness. Furthermore, casting is transitive:
| isTypeB obj | (obj[A])[B] [defines] obj
| isTypeC obj | (obj[B])[A] [defines] obj[A]
where we assume A, B, C dom M and C < B < A.
The reason why these rules look slightly more complicated than, e.g., the strictness rules above, consists in the fact that these are conditional rules where all parts must refer to the same context .
Accessor functions to attributes that are not dened in a class are syntactically illegal and ruled out by the typing discipline.
3.5 Class invariants
A class invariant of class C is a formula invC(obj) that is satised by all objects in a state,i.e., a partial map of object-identiers to objects of type oid OclAny. The traditional way to dene invariants is via an own, class-indexed family of operators:
C ::allInstances() (, ) {obj[C] ran | isTypeC obj} (22) and a conjunction of the formulae:
x (C ::allInstances() ) . invC(x), (23)
which constrains the set of possible states :: state to the valid states.
| isTypeA obj , | obj[B] [defines]
(20)
| isTypeA obj , | isKindB obj
(21)
123
Semantics and proof for object-oriented specications 265
We use a special linked list as a simple example: we assume a class Node with an Integer attribute i and one attribute next of type Node. We want to characterize the subset of Nodeobjects within a state where in each node the value of i is greater than 5 and where each value of next is dened and again in this set. This is expressed by a subclass SNode with the following class invariant:
context SNodeinv defined : self . i > 5 isKindSNode sel f .next
As a result, invSNode enjoys the recursive equation:
invSNode(obj :: SNode) = obj. i > 5 obj. next invSNode(obj. next). (24)
The denedness of the next attribute is a consequence of the fact that the invariant must be valid, implying that isKindSNodesel f .next must be valid, implying that self . next must be dened. We believe that this implicitness with respect to denedness is a good motivation for using a three-valued logic. hol-ocl offers specialized support for the derivation of these equations which are crucial for many derivations (see [9] for details).
3.6 Operation specications
The semantics of an operation specication is a big-step transition-system semantics (similar to Z) and not a small-step Hoare-style semantics; it is therefore constructed by the conjunction of the validity of preconditions and postcondition and not by their implication. This means that an operation specication:
context C :: op(p1 : T1, . . . , pn : Tn) : Tn+1
pre : P post : Q
is presented by a straightforward conversion into the definition:
opC self p1 pn . result. | P self p1 pn | Q self p1 pn result (25)
This conversion makes the implicit input parameter self and output parameter result explicit.
Moreover, we allow the overriding of operations. This means that for each class Cm < Cm1, , C2 < C1 in a class model, a new operation specication can be given that is dened
on the corresponding subtypes of self . For a given class model, the combined specication operation for a method invocation is dened as:
op self p1 . . . pn if isKindCmself then opCm self p1 . . . pn else if isKindCm1self then opCm1 self p1 . . . pn
...
else if isKindC1self then opC1 self p1 . . . pn
endif endif endif
123
266 A. D. Brucker, B. Wolff
This order of resolving the invocation overloaded operations implements late-binding, a resolution mechanism widely used in object-oriented programming languages such as Java or C#.
3.7 Discussion: simplications underlying this presentation
The construction presented in this section is simplied in several aspects compared to the construction used in the hol-ocl [9] system. In particular:
1. The presented construction is based on a specic closed world assumption: it is implicitly assumed that a class model consists only of its xed number of classes (and thus also on attributes, methods, ). This is a consequence of the simplistic construction presented here: i.e., for each given class model, the corresponding set of axioms is generated. If the class model is extended or modied, another set of axioms will be generated. In practice, this means that all proofs built upon a class model must be re-validated after an extension, which limits the usefulness of the technique. In hol-ocl, a more rened technique is therefore used which constructs a model entirely based on conservative definitions which is in fact extensible, i.e., amenable to a form of modular verication. Since the details of the construction are quite involved, we constrained ourselves to this axiomatic fragment.
2. Moreover, the hol-ocl class model compiler also generates functions that allow to construct a state; i.e., there is an infrastructure to create objects, update attributes in them, and determine even their object-identier or reference within a state.
These more powerful constructions are described elsewhere [6,10].
4 Proof calculi for ocl
In this section, we present several deduction systems for hol-ocl. In particular, we dene two equational calculi well-suited for interactive proofs and a tableaux calculus geared towards automatic reasoning. All rules we present are derived within Isabelle from the semantic definitions introduced in Sect. 3. Therefore, we can guarantee the logical soundness of all these rules with respect to the core logic. These three calculi were used to instantiate Isabelles (two-valued) generic proof-procedures yielding in decision procedures for fragments of ocl.
4.1 Validity and universal congruence
Since ocl is a three-valued logic, each ocl expression either evaluates to true ( T ), false ( F ), or undened ( ). Thus, the following theorem holds:
( | A) ( | A) ( | ( A). (26)
The natural question arises, under which conditions two formulae are equivalent for all contexts , e.g., .A = B . Since the logical equality of hol enjoys extensionality and since
all ocl formulae are evaluations, we can express this simply by:
A = B. (27)
Equations on ocl formulae are called universal congruences; so far, we have seen equalities of this form in Eq. 18, in Eq. 19 and all reduction rules for strict functions like f = .
123
Semantics and proof for object-oriented specications 267
Due to Eq. 26, we can establish the following link between validity and universal congruence:
.( | X) = ( | Y )
. | ( X)
=
| ( Y )
.
(28)
This rule is sound also the other way round (for trivial reasons). Note, furthermore, that the underlying choice for evaluation to truth and evaluation to undened is arbitrary; whenever evaluations of two formulae agree on two out of the three cases, they are universally congruent.
4.2 The Universal Equational Calculus
The basis of Universal Equational Calculus (uec), see Table 1, are Horn-clauses over universal congruences which can be applied in arbitrary ocl expressions. A proof of a formula in uec is simply its reduction to T , since the following theorem holds:
( | ) = ( | T ) = ([rightanglesw]true[rightanglese] = [rightanglesw]true[rightanglese]) = true. (29) Based on the semantic definitions for the logical operators, it is not difcult to derive the laws of the surprisingly rich algebraic structure of Strong Kleene Logic.: both _ _
and _ _ enjoy associativity, commutativity and idempotency. The logical operators also
satisfy both distributivity and the de Morgan laws. It is essentially this richness and algebraic simplicity that can be exploited for normal-form computations as well as proofs-by-hand such as the example:
A1 Ak B Ak+1 An B
= A1 An ( B B) and since B = T , we can conclude
= A1 An T
and thus, reduce our proof goal to:
= T .4.3 Reasoning over ocl-equalities
The uec introduced in Sect. 4.2 cannot be complete: some facts will only hold in some contexts , not in all of them. Moreover, expressing foundational facts on the crucial strong
equality _ [defines] _ is not possible within uec, e.g., as equality on evaluations. Due to the handling of contexts, the usual congruence properties of an equality can only be approximated (see
Table 2). This is backed up by the well-formedness predicate cp(P) (called P is context-passing) which requires that the context is unchanged on its way through P. Formally, we can dene context passing cp(P):
cp(P) E.X .P X = E(X ), (30)i.e., each term P (a transformer on evaluations) containing a sub-term X (an evaluation) must be replaceable by some E which just takes the value of X constructed by evaluating it by .
At rst sight, the handling of cp_ seems to be infeasible; however, it can be established by a simple subcalculus that decides this property for all expressions P which are only -abstractions of terms built uniquely by ocl operators (see Table 3).
X = Y
123
268 A. D. Brucker, B. Wolff
Table 1 The Universal Equational Calculus
F X = F T X = TT X = X F X = X
X X = X X X = XX Y = Y X X Y = Y X
X (Y Z) = (X Y ) Z X (Y Z) = (X Y ) Z
( X) = X (X Y ) = ( X Y ) (X Y ) Z = (X Z) (Y Z) (X Y ) Z = (X Z) (Y Z)
(X Y ) = X Y (X Y ) = X Y(a) Lattice
F = T T = T = F X = T ( X) = X (X X) = T ( X X) = T
(X [defines] Y ) = T (X
.= Y ) = X Y ( if X then Y else Z endif ) = X (X Y X Z)
(X Y) = ( X Y) X Y (X Y) = ( X Y) X Y (X Y) = ( X Y) X Y(b) Strong denedness rules.
if then Y else Z endif = f = f Y = f X =
f X = X f X Y = X Y(c) Strictness/denedness rules for total strict operations f .
if T then Y else Z endif = Y if F then Y else Z endif = Z X F = X X T = T F X = T T X = X
X Y = X Y
X (Y Z) = (X Y) (X Z)
X (Y Z) = (X Y) (X Z) (X Y) Z = X (Y Z)
(X Y) Z = (X Z) (Y Z)
X (Y Z) = Y (X Z)
X = T(X X) = T(d) Logic
123
Semantics and proof for object-oriented specications 269
Table 2 Quasi-equational theory
| a [defines] a
| a [defines] b | b [defines] a
| a [defines] b | b [defines] c | a [defines] c
| a [defines] b | P a cp(P) | P b
Table 3 The core context passing rules
cp(X.X) cp(X.c)
cpP
cp(X. f1(P X))
cpP cpP
cp X. f2(P X)(P X)
cp(P) x.cp(P x)
cp(X. (P X) (x.(P x X)))
Where f1 and f2 are strict operators that are dened by one of the three definition schemes described in
Sect. 3.2, a logical connective, or a quantier
The importance of the strong equality becomes apparent with the following rules, which establish a link to strict equality, validity, invalidity and undenedness:
| a
.= b, | a [defines] b
(31)
| A = ( | A [defines] T ), (32) | A = ( | A [defines] F ), and (33)
( | A) = | ( A) = ( | A [defines] ). (34)
The trichotomy rule Eq. 26, the rules Eqs. 3234, in connection with the substitutivity (see Table 2) allow for case-splits in arbitrary sub-formulae.
4.4 A tableaux calculus on judgements
The tableaux methodology is a popular approach to design and implement proof-procedures. Originally, tableaux methods were geared towards rst-order theorem proving, in particular for non-clausal formulae accommodating equality. Nevertheless, renewed research activity is being devoted to investigating tableaux systems for intuitionistic, modal, temporal and many-valued logics, as well as for new families of logics, such as non-monotonic and substructural logics. Many of these recent approaches are based on a special labeling technique on the level of judgments, called labeled deduction [12,30]. Of course, labeling can also be embedded into a higher-order, classical meta-logic. For example, the conjunction introduction
123
270 A. D. Brucker, B. Wolff
Table 4 The translation rules
| A( | A) = ( | A)
| A | B( | A B) = ( | A | B)
| A | B( | A B) = ( | A | B)
| A | B( | A B) = ( | A | B)
| (S :: Set()) cpP( | x S . P x) = (x. | x S | P x)
| (S :: Set()) cpP( | x S . P x) = (x. | x S | P x)
and elimination rules can be presented in natural style supported by Isabelle:
| A | B | (A B)
[ | A, | B]
...
| A B R
R
(35)
or
The operational effect of these rules in backward-style derivations is to split a validity judgement into two or to replace a judgement with a conjunction in the assumption list by the two simpler judgements. In effect, formulae are transformed into clauses consisting of atomic validity judgments containing no further logical connectives.
Tableaux calculi for strong Kleene Logic based on labeled deduction have been extensively studied [14,15,17]. We also derived a tableaux calculus for ocl [9], but the approach turned out inefcient even for small formulae.
The problem becomes apparent when considering the rules in Table 1b: establishing the denedness, which is an omnipresent side-condition to many rules, leads to many redundant case-splits which again result in denedness-reasoning; by using clever rules and forward inference techniques, this effect can only partly be compensated. Furthermore, the process of nding uniers for quantiers by a multitude of closing rules turned out to be difcult to implement inside Isabelle.
4.5 A conversion calculus for ocl
Table 4 shows the conversion calculus which we derived as a practical automated reasoning procedure. This conversion calculus allows to convert the three-valued reasoning into classical reasoning in hol; the resulting expressions can then be handled by the standard tableaux procedures of Isabelle. Thus, if we can establish effectively once and for all if all related sub-expressions are dened, we can convert a formula into a classical logical expression containing only two-valued judgements.
123
Semantics and proof for object-oriented specications 271
4.6 A note on quantiers
In the following, we discuss the extension of the propositional fragment by bounded quantiers introduced for collections. For brevity, we will concentrate on the quantiers on sets,i.e., Set().
First, we present some universal equalities of the universal quantiers, which also satisfy
the usual context passing rules in Table 3. With respect to strictness rules, the definitions of the quantiers follow the usual scheme:
x . P x
x . P x
x . P x
x . P x
Thus, for smashed sets, bound variables are always dened.
4.7 Automated deduction
The question arises, how can these calculi be combined to proof procedures that decide certain fragments of the language, i.e., to what extent can automated proof support be provided for hol-ocl? This question is of vital importance for the practicability of a proof environment. We outline two procedures implemented in hol-ocl here, one for the predicative fragment, one for equational reasoning.
4.7.1 A practical decision procedure for the predicative fragment
As briey discussed in Sect. 4.4, a direct implementation as a specialized tableaux calculus is difcult, and the theoretic results are discouraging. However, deciding that an expression is dened is in practice surprisingly well-behaved. This is a consequence of the fact that the vast majority of operations in hol-ocl are total and strict.
Moreover, in practical relevant situations (such as proving that some follows from the class invariants), _ _ is the predominant logical connective; this fact can be exploited by
normal form computations using Eq. 35.
The fact that denedness is often well-behaved, raises the question what balance between forward and backward reasoning avoids most redundancy. As forward-inference component, we decided to case-split over the denedness of all free variables occurring in the formula. In principle, this results in exponentially many sub-formulae (called splinters); due to Eq. 34, the substitutivity rule (see Table 2), and the collection of strictness rules of Table 1c, they can be drastically simplied. In particular, this simplied formulae contain only variables
= , (37)
= T , and (38)
= , (36)
= F . (39)
Second, for the dened cases, we have again conversion rules that allow for a semantic mapping of the hol-ocl quantiers to their classical hol counterparts (see Table 4).
Moreover, if x S is valid, we know that x must be dened. This is a characteristic
property of smashed sets that yields the following property:
| x S .
| x
(40)
123
272 A. D. Brucker, B. Wolff
x for which the denedness is explicit: | x. Over the normalized splinters, we apply
the conversion rules Table 4; the decision of denedness of terms is now just linear over the size of terms. Thus, we achieved a (usually small) number of converted splinters which are ordinary classical rst-order formulae to be handled by Isabelles standard two-valued tableaux procedures.
The rules for cp are pre-computed for each hol-ocl operator, such that deciding this side-condition is linear in worst-case and logarithmic in the average case in the size of the term. It only involves trivial matches.
4.7.2 Equational reasoning for strong logical equality
The universal congruence rules can be directly processed by the standard rewriting mech
anism of Isabelle; the question is how to handle _ [defines] _ effectively. Besides the obvious approach (applying substitutivity in a specialized procedure, and repeating this for each rewrite rule), we developed a more efcient technique. The key observation is that property of being context-passing can be made explicit by the following rule:
| P X
, (41)
where K is the K -combinator (x y.x) and where we require cp(P). Applying this rule exhaustively to an evaluation leads to the cp-normal-form, and obviously, the transformation is reversible. Interestingly, rules like Eq. 20 can be converted into:
isTypeB (K (obj )) = T ,
K (K(obj ))[A] [B] = obj
| (K (P(K (X ))))
(42)
i.e., into a conventional conditional equality. In our approach, rewrite rules as well as proof-states are transformed into cp normal-form, then rewritten by standard simplication, and converted back. Both transformations are linear in the size of the terms; the matching is slowed down since the size of matches doubles. However, as a whole, these costs are neglectable and the resulting simplication procedure is similarly powerful as the original Isabelle simplier.
5 Applications
In the previous sections, we described a formal, machine-checked semantics for ocl and derived calculi suited for automated reasoning. In this section, we apply this theory: we present concepts and implementation of a methodology on top of hol-ocl, enforcing a particular use of our language. At present, the following aspects of the methodology are supported:
1. an analysis method enforcing a certain form of well-formedness considered pragmatically useful, and
2. a formal renement notion allowing to convert abstract class models into more concrete ones.
Since our renement notion is transitive, an original abstract design can be converted stepwise into a version that can be converted to code automatically. We also present an implementation of these concepts in the Isabelle framework, allowing to insert analytical commands in a proof-document resulting in proof-obligations to be discharged in the sequel; the technology is further described in [33].
=
123
Semantics and proof for object-oriented specications 273
5.1 Proving consistency
When capturing the requirements for a larger software system, the problem arises how to detect potential inconsistencies, contradictions or redundancies in larger numbers of class invariants or method specications. Thus, prior to any implementation or renement attempt, there is the need for a consistency analysis of the specication.
In the following, we concentrate on a specic kind of consistency that is required by the data renement methodology we present later. Our renement methodology has both syntactic (also called well-formedness requirements) and semantic requirements.
On the syntactical side, we require for public operations that the return value and their arguments are either basic datatypes (e.g., Integer, String) or public classes. We call a class public, if it contains a least one public attribute or operation; classes that do neither contain attributes nor operations are public by default.
On the semantic side, we need constraints on states , not state transitions , for the rst time. Instead of using syntactic side-conditions (e.g., as in [28, Appendix A]) like the assertion does not contain the @pre -operator, we use a slightly more general semantic characterization which is amenable in calculi. As a prerequisite, we dene two assertions , pre-state equivalent in , written ( | ) pre= ( | ):
( | ) pre= ( | ) x, y. ((, x) | ) = (, y) | , (43)
i.e., all post-states x and y are irrelevant. For example, this is the case for assertions , where all accessors occurring in them are accessing the pre-state (i.e., using _ @pre )
and where = . Analogously, we dene the concept of post-state equivalence, written
( | ) post= ( | ). Moreover, we introduce the notion pre-state validity:
( | pre ) ( | ) pre= ( | T ) (44)
and also analogously post-date validity: ( | post ).
Finally, we dene a syntactic transformation _pre of assertions to support certain syntactical conventions of ocl; pre results from by substituting all accessor functions by their _ @pre -counterparts. For @pre -free assertions (e.g., preconditions and invariants), the following property can be proven automatically:
( | post ) = ( | pre pre). (45)
Recall that a state is a partial map from object-identiers to objects; following the ocl standard, we call states valid if and only if each object in its range satises the class invariants. We write V for the set of valid states. The empty state oid. is always in V .
Now we can describe the proof-obligations of a consistent class model conceptually. For a consistent uml/ocl package we require:
1. there must exist a state satisfying the class invariants that contains an object for each public class, i.e.,
V, a1, . . . , an. | post invC1(a1) . . . | post invCn(an),
where C1, . . . , Cn are the public classes of the class model and invC1, . . . , invCn are the corresponding class invariants.
123
274 A. D. Brucker, B. Wolff
2. For all operations, there must be a pre-state satisfying the class invariants and input variables satisfying the precondition, i.e.,
V, p1, . . . , pn. | pre (preop p1 . . . pn)prefor all public operations op with arguments p1, . . . , pn of the class model.
3. For all operations, for each a pre-state satisfying the class invariants and all input variables satisfying the precondition, there must be a result and a post-state satisfying the class invariants and the postcondition, i.e.,
V, p1, . . . , pn. | pre (preop p1 . . . pn)pre
V, result.(, ) | postop p1 . . . pn resultfor all public operations op with arguments p1, . . . , pn and with the return value result.
This notion is motivated by the following observations: if the condition described in item 1 is violated, there is always a method for which no argument can be passed that satises the invariants; if the conditions of item 2 or item 3 are violated, there is either no legal input or no function that maps it to output, i.e., the specication is not implementable. In all these cases, this means that the operation specication just means the empty transition relation which is semantically possible, but methodologically not desirable.
5.2 Proving consistency of our example
Recall our abstract model of a conference system presented in Sect. 2.1 (e.g., Fig. 1) and assume that this model is dened in a package called AbstractSimpleChair. We start our consistency analysis by importing (and type-checking) the uml/ocl specication in hol-ocl:
import_model "SimpleChair.zargo" "AbstractSimpleChair.ocl" include_only "AbstractSimpleChair"
This results in an environment holding all definitions and various automatically derived simplication rules of the data model dened in the AbstractSimpleChair package. In particular, this includes the class invariants for the classes Person, Role, Participantand Session as well as the specication of the operation findRole. We continue with the analytical command:
analyze_consistency [data_renement] "AbstractSimpleChair"
which checks the syntactic requirements of our renement methodology and, moreover, results in the generation of ve proof obligations according to the schema described in the previous section. Each proof obligation is given an own name which can be used to process it. For example, if we discharge the second obligation resulting from the statement above, we can refer to it by:
po "AbstractSimpleChair.ndRole_enabled"
The system reacts by changing to proof mode and displaying the assertions:
V, self , P, R. | pre (P self . participantspre). (46)
123
Semantics and proof for object-oriented specications 275
In proof mode, this assertion can be rened through backward-reasoning by a sequence of regular Isabelle proof commands or by specic hol-ocl ones. The proof essentially consists in providing a witness for in form of an object graph with one Person and one Session object, where the participants list is just empty. Thus, the proof proceeds by establishing that (, x) | self . participantspre [defines] [] ; the hol-ocl simplier will then com
plete the proof. After reaching the nal proof state consisting of the formula true, one can state:
discharged
whereby this proof obligation will be erased from the database of proof obligations and added to the database of proven theorems.
5.3 Rening ocl specications
Data renement is a well-known formal development technique; a standard-example for data renement is Spiveys Birthday Book [27]. The key idea is to rene abstract, but easy-to-understand system models to more concrete, complex ones that are closer to an (executable) implementation. In prominent instances of the renement method such as the B-Method, the nal concrete model is converted to code via a trusted code-generator. According to a concrete formal renement notion (such as forward simulation or backward simulation, cf. [35]), stating that one model is a renement of another one can be veried by checking syntactic constraints and by discharging (proving) automatically generated proof obligations.
Again, we will build our renement method on the level of uml-packages: one containing the abstract model one containing the concrete model. We make the correspondence between abstract and concrete public classes and public operations on the basis of their name, i.e., classes or operations with same name correspond. This syntactic constraint allows for the direct substitutivity of the abstract package, i.e., in any place, where the specication requires the abstract package, we can also use the concrete one. To make renements on packages semantically working, several side-conditions have to be imposed:
the set of public classes of the abstract model must be included in the set of public classes of the concrete model;
the set of public operations in a concrete class must be a subset of the public operations in the corresponding abstract class, and
the types of the corresponding operations must match.
Renement notions are typically based on putting the abstract states a and concrete states c into relation. To do this, an abstraction relation R must be provided by the user. An important special case is when R is in fact a function mapping concrete states to abstract states; although the proof obligations can be simplied in the functional case, we present the general case here. A forward simulation renement S RFS T po1(S, R, T ) po2(S, R, T )
comes in two parts which turn into proof obligations when stated as proof goals. They are best explained with a diagram, such as Fig. 2. The rst condition po1 means that whenever an abstract operation S can make a transition, the corresponding concrete operation T can make a transition too. The second condition po2 appears in Fig. 2b. It states that whenever the concrete operation can make a step to a new system state c, then the abstract operation must be able to reach a state a that is in the abstraction relation to c.
123
276 A. D. Brucker, B. Wolff
(a) Proof Obligation I
(b) Proof Obligation II
Fig. 2 Proof obligations of a forward simulation renement
To formalize these two conditions, two prerequisites are necessary that are related to the three-valuedness of the language:
| M S ( | S | S) (47)
and
pre S { V | V. (, ) | M S}. (48) The former definition relaxes our notion of validity to evaluating to true or to exception, which makes the exception view of explicit. The second definition characterizes the set of
pre states in which an assertion S becomes valid. In these terms, the two proof obligations for an operation declared public in the abstract model can be expressed formally as follows:
po1(S, R, T ) a pre(S), c V.(a, c) R c pre(T ) (49) and
po2(S, R, T ) a pre(S), c V, c .(a, c) R (c, c) | M T
a V. (a, a) | M S (a , c ) R.
However, these definitions leave open how to construct this global abstraction relation and how arguments of the operations are handled.
As a means to solve both problems, we suggest that the user provides a family of local abstraction relations RC indexed by the public classes of the abstract model. Thus, we can relate input and result objects in the abstract state to corresponding objects in the concrete state. The global abstraction relation R can be constructed automatically by requiring that all abstract public objects can be associated one-to-one to concrete objects and that abstract objects relate to concrete objects with respect to a local abstraction relation RC. There may be public objects in the concrete model that do not correspond to public objects in the abstract model.
5.4 A brief description of the concrete SimpleChair model
While the abstract version of the system is a classical data-model concentrating on data entities and its relations, such a model is difcult to implement; partly because high-level
(50)
123
Semantics and proof for object-oriented specications 277
Fig. 3 The concrete SimpleChair model avoids the use of association classes and, as such, is easier to implement in programming languages like Java
notations such as association classes are not supported directly by many tools, partly because a conversion to sequence attributes containing direct links to associated objects is more efcient, but more difcult since the state must be kept valid.
Figure 3 illustrates the class model of the concrete model that we dene within the package ConcreteSimpleChair. The hol-ocl specication differs mainly in the specication of the findRole operation which now uses the features of sequences.
context Session :: findRole ( person : Person ): Rolepre : person self . participantspost : result .
= roles . at ( participants . indexOf (p ))
In contrast to the abstract variant, this specication is efciently executable. Moreover, an additional invariant constraining the Session class describes that the sequences storing the roles and participants are of equal length:
context Session
inv : ||participants||
.=
Person :: allInstances () - > isUnique (p: Person | p. name )
5.5 Proving data renements of the SimpleChair-example
We load the concrete model, analogously to the abstract model, into its own theory:
import_model "SimpleChair.zargo" "ConcreteSimpleChair.ocl" include_only ["ConcreteSimpleChair"]
Now we can import both theories into a renement theory and declare the abstraction relations. This task is supported by the statement
rene "AbstractSimpleChair" "ConcreteSimpleChair"
.= ||roles||
The specication of the class Person remains unchanged:
context Personinv : name
123
278 A. D. Brucker, B. Wolff
of the hol-ocl renement component. The execution of the statement performs the following activities:
1. checking the syntactic side-conditions mentioned in Sect. 5.3,2. declaring the local abstraction relation for the public classes, e.g., Person, Role, Session, of the abstract model,
3. constructing a predicate isPublica working for the objects of the data universe dened in the package AbstractSimpleChair,
4. constructing a predicate isPublicc working for the objects of the data universe dened in the package ConcreteSimpleChair,
5. dening the global abstraction relation R (using the up-to-now undened class abstractions), and
6. generating the renement proof-obligations for the public operation findRole.
The motivation for the declaration of local class abstractions, which leave the definition to the user to a later stage, is a pragmatic one: giving the correct (hol) type for an encoded hol-ocl expression is usually quite sophisticated and requires experimenting in nding a suitable abstraction. For example, the definition that relates Person objects just relates objects with same name attribute:
RPerson a c obja objc s. (a | AbstractSimpleChair. Person. name obja [defines] s)
(c | ConcreteSimpleChair. Person. name objc [defines] s). (51)
Recall that the class invariant for Person requires that its objects are uniquely dened by their name attribute.
We now turn to the question of how to combine the family of local abstraction relations RC to a global abstraction relation on states R. The core piece is the already mentioned requirement that there must be a one-to-one assignment between objects belonging to classes declared public in the abstract package. Furthermore, all assigned objects must be in the local abstraction relation, and the public visibilities must be preserved. Altogether, this is expressed as follows:
R f g.x ran . isPublica (K x) f (g x) = x
y ran . isPublicc (K y) g( f y) = y
x ran . isPublica (K x) isPublicc (K (g x))
x ran . isPublica (K x) Robj (K x)(K (g x)), (52) where K a = .a and isPublica and isPublicc are generated predicates that decide if an
object belongs to a public class in the abstract package. These predicates are just disjunctions of all dynamic type tests. Similarly, Robj is a generated predicate combining the local abstraction relations by casting them appropriately to the common superclass, i.e., OclAny, and conjoining them disjointly. Finally, from the above renement, two proof obligations arise expressing the renement condition for each operation. The proof in hol-ocl for the case of findRole proceeds as follows. We start by:
po "Renement.ndRole"
123
Semantics and proof for object-oriented specications 279
and get the display of:
pre S, pre T. RSession self self
pre S, pre T. RPerson p p
pre S, pre T. RRole result result .
AbstractSimpleChair. Session. ndRole self p result
RFS ConcreteSimpleChair. Session. ndRole self p result
The three assumptions constrain the intended renement relation to input and output parameters that are representable in the corresponding system state of the rening system. That is, for a person p in an abstract state, we must be able to relate it to a p -object in the concrete state.
This complication is a tribute to object-orientation: we cannot require, in a world of objects, that the arguments are simply equal as we could in a world of values. Rather, we must translate objects of one state to objects in another state to express the relation of object-graphs via its structure and not using the object-identiers (references) that establishes it. Fortunately, since our example does not involve deep object graphs representing input of an operation to be rened, the local abstraction relations boil down to forgetting the object-ids and turning the person-objects into values (strings for names). In the general case, co-induction will be required. The rest of the 120 line proof is fairly straightforward and involves mostly the proof that whenever the abstract precondition is satised, the corresponding concrete precondition is also satised, as well as that the concrete postcondition is translatable into the abstract postcondition. This proof is also closed with:
discharged
6 Conclusion
6.1 Achievements
We presented a formal, machine-checked semantics of hol-ocl as a conservative embedding into Isabelle/hol. hol-ocl strives for compliance with the uml/ocl standards, at least as far as the logic, its conception as an assertion language over object graphs, and the library types (except Set) are concerned. In some minor issues (as in the case of innite sets), hol-ocl generalizes the uml/ocl standard or makes it more precise (as in the case of smashed collections); in case of doubt, we opted for semantic definitions that resulted in simpler deductions.
On the basis of this conservative embedding, we derived several calculi and proof techniques for hol-ocl. Since deriving means that we proved all rules within an interactive theorem prover, we can guarantee both the consistency of the semantics as well as the soundness of the calculi. We developed automatic proof support for the derived calculi which are specialized to the language. In particular, the calculi led to rewriting and tableau-based decision procedures for certain fragments of hol-ocl. The novel procedures have been applied for library development as well as medium-sized case studies.
We demonstrated the potential for applications of such deduction-based tools for the hol-ocl system. We adopted classical analysis and data-renement notion to three-valued, object-oriented hol-ocl and showed how this can be used to relate specications to implementations, even if based on different data universes. Thus, we provide a solid basis for
(53)
123
280 A. D. Brucker, B. Wolff
turning object-oriented modeling using uml/ocl into a true formal method; at least as far as data-modeling aspects are concerned.
6.2 Related work
We distinguish three areas of related work: general work on the uml/ocl and model-driven engineering, work on development by renement and work on verication of object-oriented systems.
6.2.1 Model-driven engineering
The term Model-driven Engineering (MDE) [18,26] refers to the systematic use of models as primary engineering artifacts throughout the development life-cycle of software systems. In the broadest sense, the term model is used for descriptions in a machine-supported format, while the term systematic refers to machine-supported transformations between models or from models to code. As a software development paradigm, mde attracted interest in academia and industry.hol-ocl is in fact embedded in an mde framework [7]. It consists of a repository, which is a database managing different versions of models, an infrastructure to build model-transformations that has been used for the transformations of our running example in Sect. 5, and an experimental generic code-generator. Other model transformations described in [8] were used to transform security models described in SecureUML, a uml extension, together with a standard class model into a standard (secured) model. This transformation produces different proof-obligations; With the help of our framework, the combined model can be transformed to code, while the proof obligations making these transformations correct can be proven by hol-ocl.
6.2.2 Renement-oriented development methods
As a formal development method, the hol-ocl approach is most closely related to the B-Method [1] and its most recent incarnation: Event-B [2]. The B-Method has been applied to substantial case studies of safety-critical systems.
In contrast to this tradition, hol-ocl establishes a development method for object-oriented specications and programs. Besides subtyping and inheritance, this means that formulae are assertions over a graph of objects linked via object identiers. This introduces the technical complication that equality on values must be replaced by other user-dened equivalence relations, be it by using object-identiers or recursive predicates representing bi-simulations.
6.2.3 Object-oriented code-oriented analysis methods
Formal analysis of object-oriented systems has mostly been done in the context of code-verication. Systems like Boogie for Spec# [20], Krakatoa [22] or esc/Java [21] for Java/Java Modeling Language (jml) are using programming language code annotated by assertions. This is converted via a wp-calculus into proof-obligations which are handled by automated or interactive provers. While technically sometimes very advanced, the foundation of these tools is quite problematic: The generators usually supporting a large language are not veried, and it is not clear if the generated conditions are sound and complete with respect to the underling operational semantics. This turns out to be a particular problem if complex
123
Semantics and proof for object-oriented specications 281
memory models are involved; The second author witnessed several stunning inconsistencies in these models for Boogie; for other systems, similar problems have been reported.
In contrast to these approaches, hol-ocl generates a conservative model for objects and states (see [9,10]). Furthermore, hol-ocl is geared towards top-down development via renement, therefore complementary to these approaches. A rst step towards code-verication via hol-ocl is described in [10], where a Hoare-calculus for a small imperative language is derived.
A substantial body of literature on code-verication on object-oriented languages is based on deep embeddings in logical frameworks like Isabelle/hol. Examples are embeddings of Java-Fragments [31], among them NanoJava [32], which focuses on meta-theoretic proofs like completeness. It served as a formal reference semantics in several other projects; however, complex side-condition hamper the efciency of the reasoning considerably. While the approach is compatible to open world assumptions in principle, it is not easily amenable for modular verication.
Another code-based analysis tool is the KeY tool [3], which has a uml front-end and which is integrated into a state-of-the-art modeling tool; it is based on a two-valued version of dynamic rst-order logic combined with a fragment of Java. KeY offers a rather powerful, specialized proof-procedure for large fragments of the language. In contrast to our conservative development, the library is just axiomatized. Methodologically, the approach is geared towards code-verication.
6.3 Future work
6.3.1 Improving technical support
While our existing proof procedures for ocl are quite satisfactory, the overall efciency needs to be increased and the a larger fragments of the language (including automated procedures for arithmetic, for example) should be covered. More congurations of our code-generator [7] are desirable for a wider range of examples.
6.3.2 Integration of top-down and bottom-up-techniques
This paper explores the potential of hol-ocl as a top-down development framework. It is our vision to integrate model mde, development by renement, code-verication and code-testing in one framework and thus to provide a combined semantical foundation as well as practical means for analysis of specications.
6.3.3 Renement
It is straightforward to integrate other renement concepts into hol-ocl, e.g., backward simulation S RBS T [35].
Finally, it is highly desirable to link method specications to implementations in concrete code of a programming language like Java. Conceptually, this is a combination of the big-step ocl semantics with a Hoare Logic relating intermediate steps (cf. [34]). In [10], we present an implementation of this method within hol-ocl; for space reasons, we have to refer the reader interested in formal proofs of this approach to the hol-ocl distribution.
Acknowledgments We thank Lukas Brgger and Simon Meier for valuable discussions on the subject of this paper. Simon Meier implemented the described rewrite procedure.
123
282 A. D. Brucker, B. Wolff
Appendix: The syntax of OCL
The ocl 2.0 standard uses a concrete syntax for ocl that is inspired by object-oriented programming languages. Whereas this textual notation is likely to be accepted easily by software developers, it looks unfamiliar and way too verbose for people with a formal methods background, in particular for proof engineers. Thus we developed a concise, more mathematical notation for ocl as an alternative, and used only the latter throughout the paper. Technically, both notations can be used in hol-ocl.
Table 5 gives a brief overview over the translation table between both notations; the reader interested in a complete comparison may consult [6,9].
Table 5 Different concrete syntax variants for ocl
ocl (standard) Mathematical hol-ocl
x = y x .
. =y
o.oclIsUndefined() o
o.oclAsType(t) o[t]o.oclIsType(t) isTypet oo.oclIsKindOf(t) isKindto t::allInstances() t ::allInstances()
OclUndefined
o.oclIsUndefined() oo.oclIsDefined() o
x - y x y
x + y x + y
x * y x . y x / y x / y
-x x
true Tfalse Fx or y x y
x and y x y
not x x
x implies y x y
if c then x else y endif if c then x else y endif
X->size() || X || X->includes(y) y X
X->count(y) X ->count (y) X->includesAll(Y) X Y
X->isEmpty()
.= XX->exists(e:T|P(e)) e X . P(e)
X->forAll(e:T|P(e)) e X . P(e)
Set{}
X->union(Y) X Y
X->intersection(Y) X Y X->complement(Y) X 1
= y
x <> y x
OclAny
OclVoid
Integer
Boolean
Collection
Set
123
Semantics and proof for object-oriented specications 283
References
1. Abrial, J.R.: The B-Book: Assigning Programs to Meanings. Cambridge University Press, New York (1996)
2. Abrial, J.R.: Modeling in Event-B: System and Software Design. Cambridge University Press, New York (2009)
3. Ahrendt, W., Baar, T., Beckert, B., Bubel, R., Giese, M., Hhnle, R., Menzel, W., Mostowski, W., Roth, A., Schlager, S., Schmitt, P.H.: The KeY tool. Softw. Syst. Model. 4(1), 3254 (2005). doi:http://dx.doi.org/10.1007/s10270-004-0058-x
Web End =10. http://dx.doi.org/10.1007/s10270-004-0058-x
Web End =1007/s10270-004-0058-x
4. Andrews, P.B.: Introduction to Mathematical Logic and Type Theory: To Truth through Proof, 2nd edn. Kluwer, Dordrecht (2002)
5. Boulton, R., Gordon, A., Gordon, M.J.C., Harrison, J., Herbert, J., Tassel, J.V.: Experience with embedding hardware description languages in HOL. In: Stavridou, V., Melham, T.F., Boute, R.T. (eds.) Proceedings of the International Conference on Theorem Provers in Circuit Design: Theory, Practice and Experience, IFIP Transactions, vol. A-10, pp. 129156. North-Holland, Nijmegen (1993)
6. Brucker, A.D.: An interactive proof environment for object-oriented specications. Ph.D. thesis, ETH Zurich (2007). ETH Dissertation No. 17097
7. Brucker, A.D., Doser, J., Wolff, B.: An MDA framework supporting OCL. Electronic Communications of the EASST 5 (2006)
8. Brucker, A.D., Doser, J., Wolff, B.: A model transformation semantics and analysis methodology for SecureUML. In: Nierstrasz, O., Whittle, J., Harel, D., Reggio, G. (eds.) MoDELS 2006: Model Driven Engineering Languages and Systems. Lecture Notes in Computer Science, vol. 4199, pp. 306320. Springer, Berlin (2006). doi:http://dx.doi.org/10.1007/11880240_22
Web End =10.1007/11880240_22
9. Brucker, A.D., Wolff, B.: The HOL-OCL book. Tech. Rep. 525, ETH Zurich (2006)10. Brucker, A.D., Wolff, B.: An extensible encoding of object-oriented data models in HOL. J. Autom. Reason. 41, 219249 (2008). doi:http://dx.doi.org/10.1007/s10817-008-9108-3
Web End =10.1007/s10817-008-9108-3
11. Church, A.: A formulation of the simple theory of types. J. Symb. Log. 5(2), 5668 (1940)12. Gabbay, D.M.: Labelled Deductive Systems, Oxford Logic Guides, vol. 1. Oxford University Press, New York (1997)
13. Gogolla, M., Richters, M.: Expressing UML class diagrams properties with OCL. In: Clark, T., Warmer,J. (eds.) Object Modeling with the OCL: The Rationale behind the Object Constraint Language. Lecture Notes in Computer Science, vol. 2263, pp. 85114. Springer, Heidelberg (2002)14. Hhnle, R.: Efcient deduction in many-valued logics. In: International Symposium on Multiple-Valued Logics (ISMVL), pp. 240249. IEEE Computer Society, Los Alamitos (1994). doi:http://dx.doi.org/10.1109/ismvl.1994.302195
Web End =10.1109/ismvl.1994. http://dx.doi.org/10.1109/ismvl.1994.302195
Web End =302195
15. Hhnle, R.: Tableaux for many-valued logics. In: DAgostino, M., Gabbay, D., Hhnle, R., Posegga, J. (eds.) Handbook of Tableau Methods, pp. 529580. Kluwer, Dordrecht (1999)
16. Jones, C.B.: Systematic Software Development Using VDM, 2nd edn. Prentice-Hall, Upper Saddle River (1990). ISBN 0-13-880733-7
17. Kerber, M., Kohlhase, M.: A tableau calculus for partial functions. In: Collegium LogicumAnnals of the Kurt-Gdel-Society, vol. 2, pp. 2149. Springer, New York (1996)
18. Kleppe, A., Warmer, J., Bast, W.: MDA Explained. The Model Driven Architecture: Practice and Promise. Addison-Wesley, Reading (2003)
19. Kobryn, C.: UML 2001: a standardization odyssey. Commun. ACM 42(10), 2937 (1999). doi:http://dx.doi.org/10.1145/317665.317673
Web End =10.1145/ http://dx.doi.org/10.1145/317665.317673
Web End =317665.317673
20. Leino, K.R.M., Mller, P.: Modular verication of static class invariants. In: Fitzgerald, J., Hayes, I.J., Tarlecki, A. (eds.) FM 2005: Formal Methods. Lecture Notes in Computer Science, vol. 3582, pp. 2642. Springer, Heidelberg (2005). doi:http://dx.doi.org/10.1007/11526841_4
Web End =10.1007/11526841_4
21. Leino, K.R.M., Nelson, G., Saxe, J.B.: ESC/Java users manual. Tech. Rep. SRC-2000-002, Compaq Systems Research Center (2000)
22. March, C., Paulin-Mohring, C.: Reasoning about Java programs with aliasing and frame conditions. In: Hurd, J., Melham, T.F. (eds.) Theorem Proving in Higher Order Logics (TPHOLS). Lecture Notes in Computer Science, vol. 3603, pp. 179194. Springer, Heidelberg (2005). doi:http://dx.doi.org/10.1007/11541868_12
Web End =10.1007/11541868_12
23. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOLA Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, vol. 2283. Springer, Heidelberg (2002). doi:http://dx.doi.org/10.1007/3-540-45949-9
Web End =10.1007/3-540-45949-9
24. Richters, M.: A precise approach to validating UML models and OCL constraints. Ph.D. thesis, Universitt Bremen, Logos Verlag, Berlin, BISS Monographs, No. 14 (2002)
25. Richters, M., Gogolla, M.: OCL: Syntax, semantics, and tools. In: Clark, T., Warmer, J. (eds.) Object Modeling with the OCL: The Rationale behind the Object Constraint Language. Lecture Notes in Computer Science, vol. 2263, pp. 4268. Springer, Heidelberg (2002)
123
284 A. D. Brucker, B. Wolff
26. Schmidt, D.C.: Guest editors introduction: Model-driven engineering. Computer 39(2), 2531 (2006). doi:http://dx.doi.org/10.1109/MC.2006.58
Web End =10.1109/MC.2006.58
27. Spivey, J.M.: The Z Notation: A Reference Manual, 2nd edn. Prentice-Hall, Upper Saddle River (1992)28. UML 2.0 OCL specication (2003). Available as OMG document. http://www.omg.org/cgi-bin/doc?ptc/03-10-14
Web End =http://www.omg.org/cgi-bin/doc?ptc/ http://www.omg.org/cgi-bin/doc?ptc/03-10-14
Web End =03-10-14
29. Unied modeling language specication (version 1.5) (2003). Available as OMG document. http://www.omg.org/cgi-bin/doc?formal/03-03-01
Web End =http://www. http://www.omg.org/cgi-bin/doc?formal/03-03-01
Web End =omg.org/cgi-bin/doc?formal/03-03-01
30. Vigan, L.: Labelled Non-Classical Logics. Kluwer, Dordrecht (2000)31. von Oheimb, D.: Analyzing Java in Isabelle/HOL: formalization, type safety and Hoare logic. Ph.D. thesis, Technische Universitt Mnchen (2001)
32. von Oheimb, D., Nipkow, T.: Hoare logic for NanoJava: auxiliary variables, side effects, and virtual methods revisited. In: Eriksson, L.H., Lindsay, P.A. (eds.) FME 2002: Formal MethodsGetting IT Right. Lecture Notes in Computer Science, vol. 2391, pp. 89105. Springer, Heidelberg (2002). doi:http://dx.doi.org/10.1007/3-540-45614-7_6
Web End =10.1007/ http://dx.doi.org/10.1007/3-540-45614-7_6
Web End =3-540-45614-7_6
33. Wenzel, M., Wolff, B.: Building formal method tools in the Isabelle/Isar framework. In: Schneider, K., Brandt, J. (eds.) TPHOLS 2007. Lecture Notes in Computer Science, vol. 4732, pp. 351366. Springer, Heidelberg (2007)
34. Winskel, G.: The Formal Semantics of Programming Languages. MIT Press, Cambridge (1993)35. Woodcock, J., Davies, J.: Using Z: Specication, Renement, and Proof. Prentice Hall International Series in Computer Science. Prentice Hall, Upper Saddle River (1996)
123
Springer-Verlag 2009