Content area
The use of domain-specific languages (DSLs) can improve the productivity of software development. However, the complexity of DSL development prevents more wide-spread usage of this technique. In this paper a framework of language construction is proposed that aims to support reusability of language elements and tools and therefore lower the cost of DSL development. It is based on the generic language that provides a syntactic substrate for developed languages. Language elements are evaluated using special functions that are expressed in existing general-purpose language. A system for static checking of code is also included. The paper presents example languages implemented using the prototype of proposed system. [PUBLICATION ABSTRACT]
Abstract - The use of domain-specific languages (DSLs) can improve the productivity of software development. However, the complexity of DSL development prevents more wide-spread usage of this technique. In this paper a framework of language construction is proposed that aims to support reusability of language elements and tools and therefore lower the cost of DSL development. It is based on the generic language that provides a syntactic substrate for developed languages. Language elements are evaluated using special functions that are expressed in existing general-purpose language. A system for static checking of code is also included. The paper presents example languages implemented using the prototype of proposed system.
Keywords: Domain-specific language; generic syntax; metaprogramming; reusability.
I. INTRODUCTION
Development of software systems requires to define solutions to problems from different domains. In most cases it is more natural to describe these solutions using concepts and operations from the domain. This leads to introduction of new concepts into the programming language using mechanisms that the language provides, for example classes or functions. However a generalpurpose language may not allow to express operations of the domain naturally. In this case a domain-specific language (DSL) can be introduced, that is specially designed for solving problems in the domain [1], [2].
A domain-specific language can be developed as a completely new language (external DSL) or it may be based on the syntax of general-purpose language used for development of the rest of the system (internal DSL) [2]. While technically being a subset of the host language, internal DSL uses its syntactic elements in some special way, that breaks common conventions of the host language. This allows it to appear as a separate language from the perspective of language user.
Development of a new external language may be complex, involving development of parser, compiler or interpreter, editor, and other tools. Although some of the tasks can be automated using existing tools like parser generators Yacc [3] or Antlr [4], it still requires to define a lot of language artifacts. On the other hand, internal DSLs are constrained by the host language.
II. DSL CONSTRUCTION ON GENERIC LANGUAGE
In the middle of these approaches is the use of some existing generic language as a basis of a DSL. We would use the term generic language for a language that does not provide its own semantics and its syntax is intended to be used as a basis or carrier for other languages. Examples of generic languages include XML [5] and Sexpressions [6]. These languages provide syntactic base for new languages and in the case of XML even a whole infrastructure of support tools including parsers, validators, transformation tools, editors, etc.
In contrast to general-purpose languages used in internal DSLs, generic language does not include elements with already defined semantics that are not needed in most of DSLs. This allows to keep developed DSL more focused on the domain and to control features exposed to programs in DSLs.
On the other hand, currently used generic languages are not fully sufficient for DSL development. While XML is popular as a basis for data formats and markup languages (like HTML), its syntax is too verbose for programming languages [7]. On the other hand, Sexpressions are too uniform, making it difficult to visually distinguish different language structures.
Presented work is an attempt to develop a new generic language and related tools for development of DSLs. It also supports reusablility of language components.
A lot of elements in different languages are similar and so it is desirable to allow reuse of such elements. The language may be composed from other languages or language libraries - collections of language elements intended for reuse [8].
Parts of languages that are most common can be included directly in the host language. Other parts, that are used only by some languages would become reusable modules. Development of languages based on the common generic host language would be actually reduced to definition of new concepts in the host language. This means, that composition of languages would become composition of concepts in a single language, making it much easier to comprehend and implement.
III. EXTENSIBLE HOST LANGUAGE FRAMEWORK
A framework for domain-specific language development was designed based on the described principles. It uses a generic language that defines common syntax for a whole family of domain-specific languages. We would call it Extensible Host Language Framework as an allusion to Extensible Markup Language (XML). The framework consists of the host language and a set of tools for processing languages based on it (see Fig. 1).
The basic tool is a parser of the generic host language that analyses program text and produces a skeleton syntax tree (SST) based on it. Unlike the abstract syntax tree that contains concrete elements, the skeleton syntax tree contains generic language elements that correspond to basic syntactic shapes provided by the host language [9].
These shapes are used to define elements of the concrete DSL. A DSL is actually a subset of the generic language. Its elements with their properties are declared in a language schema. This information is used by other tools to properly process programs in the language. The most notable of these tools is a validator, that performs static checking of programs based on a language schema. This makes possible to detect some errors before program evaluation.
The next part of the architecture is an interpreter. Its task is to evaluate a program after it is parsed and validated. The interpreter is actually a framework, that allows language author to define subprograms in form of evaluation functions that would take care of actual evaluation of DSL elements. The interpreter then traverses a DSL program and executes evaluation functions corresponding to language elements. This allows to process a program in different ways depending on language needs. It is possible to immediately execute actions specified in a program, create a model based on it, or translate a program into other language.
Evaluation functions are implemented in a generalpurpose language. In our case the implementation language is Java, and evaluation functions have a form of Java methods.
The language definition is also divided into composable modules. A module is a language definition unit, that contains several language elements with their declaration and implementation. A language based on the generic syntax can be constructed from one or several modules.
A. Syntax
A language that is designed only as a host language for other languages can not define concrete language structures. These are provided by guest languages. Host language can define only basic simple structures (like symbol or number literal) and generic shapes that allow to create composed structures. Concrete elements of a guest language must be defined as a specializations of these shapes. Result of parsing in case of host language is then a skeleton syntax tree.
The syntax of the proposed host language was designed to make it flexible and yet easily readable. The choice of its elements is based on the analysis of the common language elements in popular general-purpose languages describe in [10]. It includes several shapes for common structures (see Fig. 2). Two of them are data structures: lists and maps, and other two are control structures that allow to express structure of code: combinations and blocks. Combination is simply a sequence of language elements written on the same line. They can be nested using parentheses and by default they are interpreted as a function or operator application. In that way they are quite similar to lists in Lisp. On the other hand, block is a sequence of of language elements written on separate lines, but with the same level of indentation. We can see combinations as horizontal structures and blocks as vertical ones.
Besides of that, the host language defines literals for values of basic data types like numbers, strings, and boolean values. There are also symbols, that have a special role - they are used to express names of both language elements and user defined concepts.
Concrete syntax was chosen to avoid unnecessary noise and to make its constructs similar to popular general-purpose languages. Notation for data literals is similar to JSON - data exchange format derived from common data structures in JavaScript [11].
Syntax for combinations is similar to function application in Haskell or a list in Lisp (but with parentheses only for nested combinations). Additionally, infix notation is supported - symbols constructed from special characters are considered operators and combinations with infix operators are translated into combination with operator in the first position. Blocks are defined using indentation, similarly to Haskell or Python. This makes scripts concise and clean.
B. Evaluation
Programming language is defined as a set of sentences that can be constructed using some alphabet. Usually, the alphabet is simply a set of ASCII or Unicode characters. However, if we define lexical analysis as a separate step, we can say, that the alphabet is formed by the whole lexical units.
In case of languages based on a generic language, the alphabet consists of the elements and shapes of the generic language. What is important, the elements of such alphabet are not organized in a sequence, but they have a hierarchical structure - they form a skeleton syntax tree. These structured elements are then used as a basis for definition of new languages.
To define a language based on the generic host syntax, it is needed to specify a set of language elements and their properties. Every language element in XHL is represented using a symbol. These symbols can occur in a program. In case where element symbol is the first item of a combination, the rest of the combination is considered as arguments of element application, including a block, that may follow a combination.
Every application of language element in a program is processed by its evaluation function. Arguments of the application are passed as arguments to the evaluation function and a return value of the function becomes the value of the application.
In XHL, evaluation functions are implemented in the Java language as methods of an object corresponding to a language module. However, in contrast to other methods, evaluation function have several special properties:
* It can process code of its parameters instead of evaluating them.
* It can manipulate with the environment of evaluation, that contains bindings of program names.
* It can return a special object instead of the value, that will be evaluated on request or will generate code corresponding to the operation in a target language.
The control over the evaluation of arguments is based on the fact that arguments of evaluation function can be passed using one of the two modes:
* By value, where an element representing the argument is evaluated (using its evaluation function) and the result is passed to the evaluation function.
* Symbolically, where the evaluation function will receive code of an element representing the argument as a fragment of the skeleton syntax tree.
These properties allow evaluation functions to flexibly evaluate instances of language elements in a program. They can also introduce and manipulate names in a program and implement custom evaluation strategies using the ability to process code of parameters.
C. Schema
It is useful to perform checking of a program before its evaluation, especially in cases where evaluation includes complex computations or has side effects. Static checking can be performed even interactively while a programmer is writing code. The checking system is based on properties of a language and its elements specified in the language schema.
Language schema provides description of language elements defined by particular module. It is described using a language that is itself based on XHL. Schema contains a list of all elements and their properties including:
* Type of the value produced by the element.
* Parameters - their type and passing method (by value or symbolically).
* Symbols defined by element application.
* Human-readable description of the language element.
Example of language schema declaring two elements is in Fig. 3. Element -+" is an operator with two numeric parameters that are passed by value. Operator produces a numeric value as a result. Second declaration describes element -const", used to define a numeric constant. Its parameters are a symbol that would be bound and a value. First parameter must be passed symbolically, because it is unbound symbol that can not be evaluated. defines section says that the element binds a symbol that is gives as its first parameter to the value of type Numbers.
Static checking is based on the type system. An element has specified its type and types of expected parameters. Language schema also declares what new names are defined by each language element and what are types of objects bound to these names. Relation of generalization between types can also be specified. The validity of a program is then checked based on the compatibility of types of nested elements.
Language schema is a source of knowledge about the language, the use of which is not limited to the static checking. One of the uses of the knowledge is an interactive help system in a development environment. It can display documentation for language elements and also provide automatic completion of code.
IV. EXPERIMENTS
A prototype of the system based on presented concepts was used to implement several experimental domain-specific languages. There we present two of them to illustrate the use of proposed approach.
A. State Machine Language
The state machine language is based on the example from [2]. It allows to define events, commands and states. Each state can execute some commands and allows transitions to different states on specified events.
An example program in the language is presented in Fig. 4. You can see that events and commands are defined inside corresponding blocks. A colon operator (:) is used to connect event (or command) name and its code. The colon there is a language element in a form of infix operator that binds a new object to the name in the evaluation environment. Elements events and commands each define its own version of the colon operator locally inside a block. Colon is also part of the generic language syntax as it is used to introduce nested blocks and to separate keys from values in map literals. State definition also uses a block and the arrow operator (->) to express transitions.
You can see a fragment of the language schema in Fig. 5. Notice a nested element declaration (the colon operator inside the events element). This is used to declare local elements that will be accessible only inside a block defined by the outer element.
The schema also contains information about new names defined by elements. For example, the colon operator defines a global name corresponding to its first parameter for an object of type Event. The state element also defines a new name, but in this case it is defined backward. This means that the name can be used in a program before its declaration. Language implementation can retrieve a list of all backward defined symbols and initialize them in the beginning of the evaluation.
A fragment of the implementation is presented in Fig. 6. It shows evaluation functions for elements state and ->. Every evaluation function is represented by a Java method with special annotation @Element. Arguments of an element application are passed to the evaluation function as its parameters, and symbolic passing mode of argument is marked using special annotation. Evaluation functions also have access to the evaluator object that holds the environment of the evaluation.
B. Entities Language
Another developed language provides an example of composition of language libraries. The language allows to define entities and their properties. It also provides a way to define validation rules as boolean expressions. The language includes modules providing relational and logical operators. The result of the program evaluation in this language is generated Java code containing classes representing specified entities.
Example program in the entities language is presented in Fig. 7. It defines two entities: Employee and Department and lists their properties with types. Definition also contains validation constraints on length of some properties.
Language implementation defines only a few elements such as module, entity, or validate and property types like int and string. Operators used in validation rules are imported from external modules. An exception is the length function that provides values for operators to work on.
Validation expressions can not be evaluated in the time of the program evaluation, since they operate on value that would be known only at the stage of execution of the generated code. This means, that the result of validation rules evaluation would be generated Java code representing these expressions. This requires to use validation on request, where language elements do not execute their corresponding operations, but instead return special objects containing all operation parameters and a method for evaluation of the operation. These objects also provide a method for generation of Java code corresponding to the operation.
Actually most of the elements from language libraries return such objects. This allows the libraries to be used in both cases: where direct evaluation is needed and also for code generation. In the same time, if other language elements expect result of the evaluation, these objects are transparently evaluated by the framework. Other way round, if an element expects an object for evaluation on request, but receives plain value, it is automatically wrapped.
An example of this tachnique is presented in the implementation of the length element in Fig. 8. It returns a special object implementing the Producer interface with t°Code method for code generation.
V. RELATED WORKS
The goal of the proposed framework is to simplify development of domain-specific languages compared to external DSLs by leaving out the need to specify details of concrete syntax. It also allows composition of languages and reuse of their parts thanks to the common concrete syntax and evaluation system.
However, several other tools and techniques exist with similar goals. First of all, there are language workbenches that provide integrated development environments for development of languages and for the use of the languages [12]. Some of them, like Intentional Workbench [13] and MPS [14], use projectional editing. This means that primary representation of the code is an internal graph-based structure and editor provides only a projection of the internal representation. Projectional editing allows composition of languages and language libraries, because it avoids problems of grammar composition [15]. On the other hand, it does not allow to use existing tools expecting textual representation of the code, like revision control systems.
There are also language workbenches that are based on textual representation of programs, for example Spoofax [16] and Xtext [17]. They provide tools to define syntax and semantics of a language, and also its editor. In contrast to this, the proposed framework does not require to define concrete syntax of a language. This makes language definition more simple, but at the same time less flexible.
There is another proposal for generic language called Gel [18]. It defines rich generic syntax similar to existing languages like Java or CSS. However, it does not propose tools for language definition and processing based on the syntax, and the syntax is much richer than our proposal which also makes it more complex.
VI. CONCLUSIONS
In the paper a new approach for DSL development based on common generic syntax was presented. It allows to develop a language as a collection of elements. Properties of the elements are declared in the language schema and their evaluation is defined using evaluation functions in the implementation language. Elements are grouped into modules that can be included in other languages as language libraries.
The approach was demonstrated on example languages developed using the prototype of the proposed framework. They show that it is possible to modularize language definition based on the proposed approach and to use defined language modules in languages with different evaluation models.
Our further research in this area may include development of additional tools for language processing and use. Other goal is to raise the expressive power of the language schema. This would allow more precise static and also dynamic checking of programs. ACKNOWLEDGEMENTS
This work was supported by project VEGA 1/0341/13 -Principles and methods of automated abstraction of computer languages and software development based on the semantic enrichment caused by communication."
REFERENCES
[1] M. Mernik, J. Heering, and A. M. Sloane, -When and how to develop domain-specific languages," ACM Comput. Surv., vol. 37, no. 4, pp. 316-344, 2005.
[2] M. Fowler, Domain Specific Languages. Addison-Wesley Professional, 2010.
[3] S. Johnson, -Yacc: Yet another compiler-compiler," tech. rep., Bell Laboratories Murray Hill, NJ, 1978.
[4] T. Parr, The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Bookshelf, 2007.
[5] T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, and F. Yergeau, Extensible Markup Language (XML) 1.0, 1998.
[6] J. McCarthy, -Recursive functions of symbolic expressions and their computation by machine, Part I," Commun. ACM, vol. 3, pp. 184-195, April 1960.
[7] T. Parr, -Humans should not have to grok XML," 8 2001. Available at http://www-106.ibm.com/developerworks/ xml/library/x-sbxml.html
[8] M. Voelter, -From programming to modeling - and back again," IEEE Software, vol. 28, pp. 20-25, 2011.
[9] J. Bachrach and K. Playford, -D-expressions: Lisp power, Dylan style," 1999. Available at http://people.csail.mit .edu/jrb/Projects/dexprs.pdf.
[10] S. Chodarev, M. Vagac, and J. Kollár, -Proposal of generic syntax for domain-specific languages," Journal of Electrical and Electronics Engineering, vol.5, no. 2, 2011.
[11] D. Crockford, -The application/json Media Type for JavaScript Object Notation (JSON)." RFC 4627 (Informational), July 2006.
[12] M. Fowler, -Language workbenches: The killer-app for domain specific languages?," 2005. Available at http:// martinfowler.com/articles/languageWorkbench.html
[13] C. Simonyi, M. Christerson, and S. Clifford, -Intentional software," in OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, (New York, NY, USA), pp. 451-464, ACM, 2006.
[14] S. Dmitriev, -Language oriented programming: The next programming paradigm," November 2004. Available at http://www.jetbrains.com/mps/docs/Language_Oriented_ Programming.pdf
[15] M. Voelter, -Language and ide modularization, extension and composition with mps," Pre-proceedings of Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE), pp. 395-431, 2011.
[16] L. C. L. Kats and E. Visser, -The Spoofax language workbench. Rules for declarative specification of languages and IDEs," in Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, pp. 444-463, 2010.
[17] M. Eysholdt and H. Behrens, -Xtext: implement your language faster than the quick and dirty way," in Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, SPLASH '10, pp. 307-309, ACM, 2010.
[18] J. Falcon and W. R. Cook, -Gel: A generic extensible language," in Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages, (Berlin, Heidelberg), pp. 58-77, Springer-Verlag, 2009.
CHODAREV Sergej
Technical University of Kosice, Slovak Republic,
Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics
Letná 9, 042 00 Kosice, Slovak Republic, E-Mail: [email protected]
Copyright University of Oradea May 2013