Content area
DialogDesigner is an integrated design and development environment that supports dialogue designers in creating an electronic dialogue model, writing dialogue snippets, running and analysing simulation sessions, getting graphical views of the model, making automatic evaluation regarding dialogue model well-formedness, compiling the model into run-time code, and extracting different presentations. DialogDesigner has been used for research purposes as well as in commercial projects. Its primary focus is on providing support for the development process. We explain underlying ideas, illustrate the functionality of DialogDesigner and discuss its strengths. [PUBLICATION ABSTRACT]
Lang Res Eval (2006) 40:87107
DOI 10.1007/s10579-006-9010-8
ORIGINAL PAPER
Hans Dybkjr Laila Dybkjr
Published online: 8 November 2006 Springer Science+Business Media B.V. 2006
Abstract DialogDesigner is an integrated design and development environment that supports dialogue designers in creating an electronic dialogue model, writing dialogue snippets, running and analysing simulation sessions, getting graphical views of the model, making automatic evaluation regarding dialogue model well-formedness, compiling the model into run-time code, and extracting different presentations. DialogDesigner has been used for research purposes as well as in commercial projects. Its primary focus is on providing support for the development process. We explain underlying ideas, illustrate the functionality of DialogDesigner and discuss its strengths.
Keywords Spoken dialogue systems Dialogue model Development and evaluation process Tools support
1 Introduction
Prolog Development Center A/S is a company that produces spoken dialogue systems (SDSs). This has led to a need for tools in support of SDS design and development beyond the mostly coding-oriented development tools commonly available for VoiceXML or as part of commercial telephony platforms. To meet this need we have created an integrated design and development environment (IDE), DialogDesigner, centred around a generic dialogue model and incorporating a set of tools operating on that model.
The primary motivation has been to achieve (i) a more cost-efcient system development process with user-involvement by supporting rapid dialogue model
H. Dybkjr (&)
Prolog Development Center A/S, H. J. Holst Vej 3C-5C, 2605 Brndby, Denmark e-mail: [email protected]
L. Dybkjr
Natural Interactive Systems Laboratory, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmarke-mail: [email protected]
DialogDesigner: tools support for dialogue model design and evaluation
123
88 Lang Res Eval (2006) 40:87107
design and evaluation, while at the same time (ii) ensuring efcient and easy-to-use SDSs.
The work on a dialogue model does not start where one already has a precise idea of what the model is going to look like. Often the point of departure is snippets of concrete dialogue that illustrate use cases, scenarios or parts of scenarios, and the dialogue model emerges on the background of these. DialogDesigner supports the design process from the very beginning by enabling this approach to dialogue model design. Alternatively, it is also possible to start using DialogDesigner by entering an early version of the dialogue model. In either case, as soon as a rst electronic dialogue model has been created, support may be provided for its further development and evaluation by enabling access to a suite of tools, as also pointed out by (Harris, 2005).
The target user group for DialogDesigner includes SDS designers and developers. Programming expertise is not required by designers though they should have a solid understanding of formalisation of dialogue modelling and understand the related terminology.
The rst version of DialogDesigner was built in early 2005 while a second and extended version was implemented in 2006, adding, among other things, support for automatic analyses of dialogue models, snippet design, as well as compilation of the model into runtime code. The tool is continuously being improved and expanded. It is implemented in C# and runs on a Windows platform. So far the rst version has been used during development of a commercial trafc information system and in a commercial auto-attendant system. The second version is new but has been used in a commercial project that extends the trafc information system and in an update of the auto-attendant system. Furthermore, two demos have been designed for testing and exploration purposes. A pizza ordering demo was designed using version one while a calendar application allowing students to book a time slot in the teachers calendar for discussion of their project was designed with version two.
In the following we present DialogDesigner in more detail. Section 2 presents the goals of DialogDesigner, how it supports the development process, and how the outcome relates to a general SDS architecture. Section 3 deals with approaches to dialogue models and describes how it is done in DialogDesigner. Section 4 describes the tools included in DialogDesigner in support of the development process, including a dialogue snippet design tool, a simulation tool, a visualisation tool, two analysis tools and a code generation tool. Section 5 presents the possibility for generation of various presentations of the dialogue model. Section 6 describes DialogDesigner in relation to its goals and discusses its strengths. Section 7 discusses and concludes on the presented work and outlines future work.
2 Goals and the development process
The primary aim of DialogDesigner is to support an efcient, iterative development process with customer and user involvement. In (Dybkjr & Dybkjr, 2004b) we identied three main problems for SDS development, i.e. complex systems, communicating the dialogue model to stakeholders, and efcient code development. The following list of goals for DialogDesigner has its origin in these identied problems.
Contemporary dialogue complexity. The IDE should support universal modelling of todays systems. This includes:
123
Lang Res Eval (2006) 40:87107 89
Task oriented dialogues using limited natural language within specic delimited task domains. Full natural conversation in open domains is not considered.
Heterogeneous tasks, i.e. several tasks that the user may choose from or arbitrarily switch between.
Communicate with developers, customers, and users about SDS design. This includes:
Presentation of the dialogue ow in a way that is intuitively easy to understand. Simulating dialogues with a tool that builds on the electronic dialogue model.
This can be used for walkthroughs among developers and with customers, or for pre-implementation tests with users (Wizard of Oz). Lists of prompts and phrases for validation by customers and for recording by sound studios. Speech synthesis is for many languages still not of sufcient quality to be used in commercial walk-up-and-use applications, and even if it is, some sculpturing may be needed.
Efcient development of code. This includes:
Work separation. People with different expertise should be able to work on different parts of the system: grammars, ow, prompts, recording, coding.
Reusing ow in different parts of the system. Code generation, making implementation, dialogue model, and presentations consistent with each other. Automatic analyses of the model for consistency, well-formedness, etc.
2.1 Development process support
DialogDesigner is intended to form part of concrete system development processes and provides support for a highly iterative approach to dialogue design which is known to be an efcient approach to building systems of high quality and which is in line with modern life cycle models such as the Unied Process (UP) model (Jacobson, Boosch, & Rumbaugh, 1999), cf. Table 1, and various agile methods, e.g. (Beck, 1999). The use case driven development approach of UP is supported in DialogDesigner via the dialogue snippet tool, Sect. 4.1.
2.2 DialogDesigner and the SDS architecture
DialogDesigner is an off-line tool, i.e. it is not part of a runtime dialogue system, cf. Fig. 1. The designer edits a dialogue model which is then compiled into the control structure of the dialogue manager. As part of the dialogue model the designer species the focus in terms of grammar names, and the prompt phrases are extracted for use in the runtime system, and mapped to the sound les to be played (if speech synthesis is not used).
DialogDesigner is independent of the runtime system. Currently we use HotVoice from Dolphin, but support for other systems could be added. For instance, we plan to add compilation into VoiceXML. When a new dialogue system is being built, the semantics and the domain model (application data and business logic) must be hand coded. Predicates, variables, and framelike structures used in actions and conditions in the dialogue model within DialogDesigner must be dened in the domain model.
123
90 Lang Res Eval (2006) 40:87107
Table 1 DialogDesigner in the development process
Phase Activity
Inception Requirements are written with focus on functionality (use cases).DialogDesigner: The main ow is outlined in the dialogue model. A rst set of key dialogue examples and variants are dened as snippets. No attempt is made at mapping the examples into the model.
Elaboration The domain model is analysed, formalising frames, variables, their dependencies, constraints, and possible origins of data. The dialogue is designed. DialogDesigner: More snippets are dened. The prompts and the ow of the dialogue are formalised. The simulation facility is used to evaluate the design, both by using simple walkthrough and Wizard of Oz (WOZ), and both internally in the design group and with user or customer representatives. All snippets are mapped into the model, formally ensuring consistency. All prompts and transitions are assigned act-topics. Snippets are extracted for inclusion in updated requirements documents, supplemented by state graphs for overview.
Construction The program is written including all domain logic and application and database communication.
DialogDesigner: Domain state predicates and formal transition, state and prompt conditions are added to the model. The model is repeatedly checked using the three analyses: basic model health check, snippet consistency check, and act-topic check. Prompt and phrase lists are extracted for use by the voice designer. Model and prompt reports are extracted for (nal) validation and acceptance by the customer. Code is generated for testing the real system.
Transition Final testing as well as installation at the customer site.DialogDesigner: Snippet report is extracted for systematic test. Code is generated for deployment.
Fig. 1 Generic architecture. DialogDesigner is off-line, but relates to various parts of the runtime system
3 Dialogue models
We shall now take a closer look at what kind of dialogue model is suitable. Many approaches today aim at supporting conversational dialogue, e.g. Collagen (Rich,
123
Lang Res Eval (2006) 40:87107 91
Sidner, & Lesh, 2001) which employs a discourse structure approach to dialogue modelling, based on the attention/intention/linguistics theory of discourse structure (Grosz & Sidner, 1986), supplemented by a partial planning module.
However, we nd that our goal of making it easy to convey the main structures and prompts to stakeholders, and our delimitation to task-oriented dialogues, makes it necessary and sufcient to take the more explicit approach of using dialogue graphs as a basis for dialogue modelling. In the next two subsections we rst describe three different graph-based approaches and their advantages and disadvantages, and then briey present the synthesized approach taken in DialogDesigner.
3.1 Approaches to dialogue modelling
While humanhuman dialogue may be seen as a joint effort containing lots of overlaps and interweaving utterances that together constitute the overall discourse (Steensig, 2001), this complex interpretation needs to be simplied when it comes to spoken human-computer dialogue. This is not least due to the current state-of-theart in input/output technology (speech recognition and generation) which is needed for spoken humancomputer interaction and which is unable to handle the complexity often found in humanhuman dialogue. Therefore we only consider dialogues that consist of a number of alternating system and user turns, with the possible addition of barge-in handling. Since our focus is on computational dialogue models, we may view a dialogue model as a program and the set of all possible dialogues enabled by the dialogue model as the set of all paths through the program. Specically, we shall view these paths as graphs consisting of states connected by transitions.
Let us consider three different pure ways of composing dialogue models as graphs, cf. Fig. 2. While all of them are graphs, there are crucial differences in the design and computational options:
State production systems: The dialogue is a set of states. Each state is guarded by a condition or priority, has a prompt, has a focus grammar, and may change the global context. The basic loop is to select the state with the highest priority among those with satised state conditions, play its prompt, wait for input according to the grammar, change the context, and start over again.
Transition production systems: The dialogue is a set of conditional transitions which can provide feedback. There is one state waiting for input, corresponding
Fig. 2 Modelling via state or transition production systems (left and middle), or ow charts (right). Rounded boxes are states. A box with a horizontal split line is conditional. Small boxes in the middle of arrows are feedback. Arrows with lled heads are conditional
123
92 Lang Res Eval (2006) 40:87107
to having all input enabled all the time. The basic loop is to select a transition with a true condition, change the context, play its prompt, wait for input in the state, and start over again. Flow charts: The dialogue is a set of states connected by conditional transitions. The basic loop is to play the state prompt, wait for input, change the context, select a transition with a true condition, advance to the state it points to, and start over again.
While the production system type of model provides a dynamic and exible computation, it provides no structural hints. This is in contrast with the ow chart that exhibits a clear structure of the overall discourse, but is inexible, and when many details are modelled, the structure tends to become cluttered.
3.2 The DialogDesigner model
In DialogDesigner we combine the presented models. A dialogue model has a set of states connected via conditional transitions. The states are hierarchically grouped which improves the overview of the model. Moreover, the groups may function as targets of transitions such that each group of (conditional) states may function as a state production system.
Each state in a dialogue model has a set of zero or more conditional prompts (system utterances) attached, as has each transition. However, a state may or may not accept user input which means that a system turn may continue across more than one state and may include several system utterances. A user turn, on the contrary, is restricted to one utterance only (at least in the present version of DialogDesigner) and is only possible in those states which accept user input.
Figure 3 illustrates a partial dialogue model depicted as a graph. The basic processing loop is the following:
1. Enter the rst state.2. Play the prompt, if any.3. If the state accepts user input, wait for it, resolve it and update the discourse and domain representations.4. Select a transition with satised condition. Play the prompt, if any.5. Resolve the target of the transition:
If target is a state, enter it. If target is a group, select a state with satised condition in the group.
6. Go to 2.
Figure 4 is a screen shot of the DialogDesigner design window that is used for entering a dialogue model. The gure shows the state hierarchy (D1), the state information including name and condition (D2), the state prompt set (D3), the transitions (D4) including name, condition, and target, and the transition prompt set (D5) of the selected transition.
DialogDesigner enables a static inclusion mechanism as well as a dynamic continue primitive both of which support reuse via sub-structuring of the dialogue model. Transitions may include the transitions of another state. In Fig. 4, the curly brackets enclosing the transition {Commands} indicate that all transitions listed in the Commands state will be included here. Dynamic excursions to sub-dialogues are
123
Lang Res Eval (2006) 40:87107 93
Fig. 3 A partial dialogue model. Rounded corner squares are states, and the circle is a group of states with member states attached via lines. The model has conditions (Cs) on three of the seven shown states, several states have conditional prompts (Cp) grouped together, three transitions play feedback prompts (F), and all transitions are conditional (lled arrow heads) except the entry
Fig. 4 The design window. Encircled numbers are referenced in the text
modelled via the Continue column which may indicate where to proceed when returning from a visit to a target state which just has a continue marker in the Target column. This may be used e.g. for modelling a generic help functionality.
123
94 Lang Res Eval (2006) 40:87107
3.2.1 Act-topic annotation
Act-topic annotation represents yet another way in which to impose structure on the dialogue model and provides a basis for testing certain properties of the model, cf. Sects. 4.4 and 4.5. Our approach builds on ideas presented in (Dybjr & Dybkjr, 2004a). See also (Dybkjr & Dybkjr, 2006) concerning speech acts and the use of act-topic annotation.
When a dialogue model is entered in DialogDesigner, it is possible to annotate each prompt and each transition (where user input is expected) with speech acts and topics, cf. Fig. 4. Only one speech act can be assigned per prompt and one per user input, while more topics may be assigned in both cases. This is a simplication which may not be entirely correct since a prompt or a user utterance may indeed include more than one speech act.
Speech acts and topics are an abstraction that tells something about what happens when we select a particular entry in the dialogue model. They must be assigned manually. No automatic support is available for proposing an act or one or more topics for a prompt or an expected user utterance.
DialogDesigner comes with a set of 14 default speech acts, but may be congured to other sets. The 14 default speech acts are: accept, check (if understanding was correct), clarify (something ambiguous), feedback, hangup, inform, offer, other (i.e. unclear or null action), pause, reject, repair, repeat, request, and select. We dont believe in the possibility of a standard set of speech acts because what is an appropriate set of speech acts is highly dependent on the sort of analysis one wants to perform. However, we do believe that some reuse of speech acts is possible across applications, see also (Dybkjr & Dybkjr, 2006). Topics, on the other hand, are highly domain and task dependent. Therefore the user of DialogDesigner always has to dene his own set of topics for a dialogue model.
The speech act annotation is not used at runtime. It is used for analysis of dialogue model well-formedness, see Sects. 4.4 and 4.5. However, the annotation may inuence the implementation. The annotation may be understood as a signal to the implementor. For example, a feedback act might require a shorter subsequent pause before timeout than a request for information.
4 Tools
While the dialogue model is the central object of the development process, the tools offered by DialogDesigner are key to ensuring an efcient process, cf. the goals listed in Sect. 2 and Table 1. In the present section we describe these tools which include support for dialogue snippet design (Sub-sect. 4.1), use of walkthrough and WOZ simulation (Sub-sect. 4.2), graphical visualisation of the dialogue model (Sub-sect. 4.3), analysis of aspects of well-formedness of the dialogue model in terms of a number of health analyses (Sub-sect. 4.4) and in terms of an analysis based on act-topic patterns (Sub-sect. 4.5), and code generation (Sub-sect. 4.6).
4.1 Snippet design
Often dialogue modelling takes its point of departure in scenarios or in subparts of scenarios. We design the dialogue model based on knowledge of concrete situations
123
Lang Res Eval (2006) 40:87107 95
that the SDS must be able to deal with, and we have concrete ideas of formulation, style, and ow of exchanges. When during development we want to evaluate the emerging dialogue model, e.g. via walkthroughs or WOZ simulation, scenarios are again important. Later in the development process scenarios are still important to evaluate also the implemented dialogue model.
In DialogDesigner we use the term dialogue snippet to denote (part of) a concrete dialogue. Often the snippet will be equivalent to an entire scenario corresponding to a specic use case variation, e.g. booking a one-way ticket from Copenhagen to Aalborg for a particular time and date. However, if certain parts of the system-user dialogue is expected to have many important variations, it is practical to focus on these parts rather than having to create full scenarios all the time. Thus snippets may cover as little as a single utterance or a single exchange between the user and the system, e.g. eliciting a date from the user.
It is possible in DialogDesigner to start designing dialogue snippets even before any dialogue model has been dened. Figure 5 shows an example of a dialogue snippet. This is likely to be the way many dialogue model designers would prefer to work with the snippet tool in the requirements writing and early design sketching phases. Focus is on formulations and dialogue design, and snippets are often entered and/or veried in cooperation with user representatives. One could call this approach design by example.
Once a dialogue model has been entered, the designer may begin to map the snippets into the model by assigning states and transitions to each turn in the snippet.
A snippet that has been mapped into a dialogue model can be veried automatically against the model to see if the turn sequence of the snippet is (still) compatible with the model. Therefore such snippets form the basis for regression test of the dialogue model whenever it changes.
Figure 6 shows the result of a mapping between a snippet and the dialogue model where the error message (top right) concerning turn 5 implicitly reveals that the
Fig. 5 The basic window used for dialogue snippet design
123
96 Lang Res Eval (2006) 40:87107
Fig. 6 The window used for dialogue snippet design showing mapping between the specied snippet and the dialogue model. To the left: the specied set of snippets and their status indicated by smileys. In the middle: the selected snippet with its turn sequence. To the right: states and transitions that can be inserted as turns in the snippet. The window has been arranged for readability in this document so that only two of the transition columns are visiblein normal use all the columns visible in the design view (Fig. 4) are shown
dialogue model does not take into account that the teachers calendar may be fully booked on a particular date. Having xed this error, we would still get warnings about the difference between snippet and model prompts in turns 4 and 5.
4.2 Simulation
The snippet tool provides support for simulation techniques such as scenario-based walkthroughs or WOZ. Snippets may be generated from the dialogue modeldisplayed to the right in Fig. 6by selecting states and transitions which are then inserted into the active snippet. Since conditions are not evaluated during simulation, the designer is asked to choose states, prompts, and transitions whenever ambiguity arises.
Walkthroughs of dialogue models can protably be done by designers or developers with the purpose of discovering missing or awed functionality and inappropriate interaction which is likely to cause problems for users. Walkthroughs may be based on scenarios which are made at the beginning of the design process or generated on the y.
WOZ sessions are typically scenario-based. Preferably representative users should be involved to collect reliable data. However, for early and rough tests colleagues, customer employees, or other persons at hand are very useful to get an overall
123
Lang Res Eval (2006) 40:87107 97
impression of the extent to which the system seems to work and where major pitfalls may be.
Walkthrough and WOZ sessions are saved in the same way as snippets. Saved sessions may always be opened as any other snippets for inspection, editing and commenting which may be useful for analysis. It is also possible to generate a report showing one or all sessions in HTML format, cf. Fig. 7.
The simulation feature can be used normatively to generate snippets as test scripts. These may then be used in a systematic functionality test of the implemented SDS.
The snippet tool may be used during presentation and discussion sessions with customers and end-users to demonstrate e.g. dialogues for typical scenarios. It is also possible to use the tool and create (partial) scenarios during discussion with stake-holders.
4.3 Graphical visualisation
Since DialogDesigner is based on a kind of conditional graphs, it seems natural to display the dialogue model graphically. Thus DialogDesigner has a graph tool for displaying dialogue models, cf. Fig. 8. It is not necessary to have a fully nalised dialogue model before one can benet from the graphical view. Actually we recommend to run rapid, possibly incremental, cycles using much of the tool functionality in DialogDesigner iterativelyincluding the graphical viewin parallel with dialogue model design.
Showing all states of the entire model in a graph is not useful, since graphs at even a modest level of complexity become cluttered. But visualisation of the groups together with selected transitions (often domain and maybe command transitions) provides a nice overview. Another useful view is to fully expand a node with all the ingoing and outgoing transitions: This provides a nice overview of the connectivity of the state in focus and whether some transitions are missing.
4.4 Health analyses
DialogDesigner supports four kinds of automatic analysis of the dialogue model regarding its well-formedness. We call these analyses health analyses. Two of
Fig. 7 Snippet report
123
98 Lang Res Eval (2006) 40:87107
Fig. 8 A graphical view of part of a dialogue model. States are drawn as ellipses, groups as double ellipses, and transitions as rectangles. Domain, system, and universal (or command) transitions are coloured and marked differently. Transitions are directedthe small circles may be interpreted as arrows
these are based on act-topic annotation of the dialogue model, while the other two analyses are not. We recommend to use the health analyses iteratively from early on. They check simple aspects of well-formedness and help discover design aws which should preferably be corrected prior to a simulation session.
The two analyses not based on act-topic annotation check all states for
reachability, i.e. whether it is possible from the initial state to reach any other state dened in the dialogue model, and
re-entrance, i.e. whether one can get back to each state in a nite number of steps. Self-transitions are ignored in this analysis.
For both analyses the output per state is either a warning that the state is not reachable or not re-entrant, or it is information on how many steps it as a minimum takes to reach or get back to the state in question, cf. Fig. 9.
The two act-topic based health analyses check each prompt and each transition to see
if a speech act has been indicated and if it is one of the dened acts (listed automatically near the bottom of Fig. 9), and
if topics are used that are not in the list dened by the designer, cf. the bottom of Fig. 9.
The analyses issue a warning whenever they detect a missing or undened act or topic.
123
Lang Res Eval (2006) 40:87107 99
Fig. 9 Health analyses
4.5 Using act-topic patterns in analysis
There is a second kind of automatic analysis which also exploits the act-topic annotation. This analysis requires the specication of act-topic patterns (also called rule patterns) and then allows for subsequent automatic analysis of whether the dialogue model conforms to the specied patterns.
Rule patterns are act-topic sequences written on the following form in BNF:
RULE = RULENAME: CONDITION ? SEQUENT CONDITION = [^] TURN* SEQUENT = TURN* TURN = WHO (ACTTOPICS+ ) ; WHO = s | u | _ ACTTOPICS =ACT {TOPIC* } ACT = _ |ACTNAME RULENAME = NAME(.NAME)*
Two examples are
testRequest.Inform: s(request{}) ; ? u(inform{}) ; testRequest.InformTopic: s(request{month}) ;? u(inform{month}) ;
The s (system) and u (user) are used to indicate who performs which act. Request and inform are speech acts. The {} indicates any topic(s), i.e. in the rst example we dont care which topic(s) the system and the user are addressing whereas in the second example the topics must include month. The condition part is that the system has requested information. If this is the case andin example two
123
100 Lang Res Eval (2006) 40:87107
onlythe topic is month, the analysis checks if the turn following the question mark is possible, i.e. if the user may provide information. In example two the information must specically concern the topic month.
A third example is
testPause: s(_{}) ; ? u(pause{}) ; s(repair{}) ;
Repair, pause and _ are speech acts where _ means any speech act. The condition part is that the system has said something. If this is the case, the analysis checks if the turns following the question mark are possible, i.e. if the system can handle user silence by initiating repair. The analysis is performed in the same window as the health analyses, cf. Fig. 9 where rule patterns are grouped and listed to the left.
For each selected rule pattern the automatic analysis runs through the dialogue model looking for the condition part of the rule pattern in prompts and transitions. Whenever the condition part is found, the analysis will check if the turn or turns specied in the sequent in the rule pattern are also allowed for where the condition was found in the dialogue model.
The rules check for existence. This means that the analysis will succeed for a given state if just one match with the rule pattern is found. The analysis does not check if there are several matches for the same rule pattern in a particular state. Also, the analysis is an abstraction in the sense that it relies on the act-topic annotation without computing the condition elds of the dialogue model. In principle the act-topic annotation must be consistent with the conditions specied in the dialogue model. However, in practice the actual runtime conditions may turn out to not allow the path although the analysis shows that a path is possible.
An act-topic pattern may be fairly general and if this is the case, it may very well be reused across different dialogue models. We have so far specied act-topic patterns to perform the analyses listed below. The list is kept at a general level. It should be noted that some of the rule patterns have been used across all the dialogue models developed so far using DialogDesigner, while others have only been used in one or some of the dialogue models.
Universals: In any input state universals, such as repetition, help, and goodbye, should be included as possible transitions.
Events: In any input state events, such as nothing understood (noMatch in VoiceXML), timeout, and hangup, should be included as possible transitions.
Feedback: Whenever the system provides feedback, the user should have the possibility to reject or repair the feedback. Moreover, it may be desirable that user inform or select acts are followed by feedback from the system. We also have rules that check this.
Common act sequences: There are several, e.g.:
If the system makes an offer, it must be possible for the user to reject the offer or to accept or select anything from the offer.
If the user has selected an offer, it must be possible for the system to provide information.
If the system requests information, it must be possible for the system to receive that information from the user.
Topic reactions: Requests concerning a topic T must have the possibility to be followed by a response concerning T.
123
Lang Res Eval (2006) 40:87107 101
Since the act-topic-based analysis checks more formal aspects of well-formedness, we recommend to use this part of the analysis tool only when a full draft of the dialogue model has been established.
4.6 Code generation
DialogDesigner supports code generation. Once the dialogue model is reasonably formalised code can be generated automatically. For the moment the model can only be compiled into HotVoice code but it is planned to also enable compilation to VoiceXML. The generated code may include warning and error messages. For example, there may be a warning that a particular condition is always true and that subsequent transitions therefore have been skipped. Or there may be an error message indicating that there is a transition to an empty state (null state).
5 Reports
Five different reports or presentations of the dialogue model may be extracted in DialogDesigner. Report generation may be considered a special kind of development process support tool. Two of the enabled presentations are meant for communication with and use by phrase speakers. One of these presentations is a phrase list while a second is a prompt list, both in HTML. If a phrase is used more than once in the dialogue model the second or later occurrences are struck through to clearly mark repetitions. The advantage of presenting the phrase list as a prompt list is that this makes it clearer to the phrase speaker what the context is.
A third option is to extract the prompt list as a comma separated (CSV) le. This facilitates import of the le into other tools used by people working in the sound studio. The set of features extracted is congurable.
The fourth kind of presentation contains the dialogue model in terms of all states with their prompts and possible transitions. Transitions are links which means that the HTML model can be used for navigating the dialogue model, cf. Fig. 10, without having access to DialogDesigner. We have found this HTML presentation very helpful for communicating with customers.
The fth kind of presentation is much like the fourth one but includes more details for each state, such as grammar information and notes. Thus this presentation is meant for internal communication in the development group where such details are of relevance. As the IDE becomes easier to use, the importance of this report decreases.
6 DialogDesigner goals reviewed
There exists a wealth of tools and IDEs that one way or another support SDS development and evaluation. Some are free while others are not. There are, e.g., plenty of tools and IDEs available for developing and testing VoiceXML applications, see e.g. http://www.w3.org/Voice, and speech development kits (SDKs) from voice companies, such as Nuance and Loquendo, normally come with a suite of tools some of which support dialogue development.
123
102 Lang Res Eval (2006) 40:87107
Fig. 10 Excerpt of HTML presentation of the dialogue model
In the following we shall briey review the goals of DialogDesigner and relate to other, existing tools. Table 2 summarises the achievements in constructing DialogDesigner.
6.1 Contemporary dialogue complexity
The rst goal is support for modelling of contemporary dialogue complexity, i.e. DialogDesigner must support modelling of todays state-of-the-art dialogues that are heterogeneous and task-oriented. However, having a goal addressing contemporary dialogue complexity is equivalent to having a moving target. To illustrate this, let us look at a few examples.
In (Dybkjr & Dybkjr, 2004b) we described howin a system from 2001we explicitly modelled barge-in by measuring the time the user spent listening to a prompt. If the time in milliseconds was less than the time needed to speak the prompt, we knew the user had used barge-in. We made this explicit modelling because we needed the feature and the implementation language HDDL (Aust, Oerder, Seide, & Steinbiss, 1995) provided in the SpeechMania platform did not support event handling of barge-in detection. However, in the specication of VoiceXML 2.1 from June 2005, which is now supported by VoiceXML platform providers, there is a primitive after which barge-in will ll in the two variables markname and marktime, described exactly under the heading Using to Detect Barge-in During Prompt Playback.
At least it was possible in HDDL to implement barge-in detection via more primitive timing predicates. For other features implementation may be infeasible if not supported by the platform. For example, Klzer (2002, p. 133) notes that VoiceXML 1.0 does not support N-best recognition results. This feature is supported in VoiceXML 2.0 via the variable application.lastresult (March 2004).
A third example is anaphora resolution which, as noted by (Klzer, 2002), is not supported in VoiceXML. However, to the extent that one has a recipe for resolving references, one may implement anaphora resolution on a VoiceXML platform since VoiceXML 2.0 provides access to the recognised string. So anaphora resolution is
123
Lang Res Eval (2006) 40:87107 103
Table 2 Features of DialogDesigner. +: Has feature. : Does not have feature. *: In pipeline
Feature Status Note
Graph design * This feature is somewhat overlapping with the use of snippets. Graph view +Log analysis Simulation logs may be annotated manually. Standard dialogues State inclusion provides standard reactions but not standard dialogues. A limited kind of standard dialogues can be obtained using the continue primitive.State conditions + Predicate logic over dialogue and domain state. WOZ + Simulation via the snippet tool. Play prompts * Phrases may be played. Record prompts + Via import from and export to Recording Station which is a tool distributed with SpeechPearl from Nuance. Structured prompts + Grammar with embedded generative predicates. Print model + HTML, with transitions as links. Phrase list + CSV or HTML, for sound recording studio. Prompt list + HTML, shows phrases in prompt context. Code generation + Currently compilation into HotVoice. Support for VoiceXML is expected soon.
Debugging Make test scripts + Snippets may be generated from the model. Regression test + Snippets can be repeatedly tested.
Speech recognition Telephony (+) Via predicates in the target platform for the compilation.
something that is not directly supported by the contemporary platform, but which may be encoded on that platform although with some difculty.
There are also features that theoretically can be used, but for which there is no platform support. An example is that certain prosodic features have been shown to be good indicators of aware sites of system errors which might be useful in deciding on the dialogue strategy (Hirschberg, Swerts, & Litman, 2001). However, no commercial recognisers support that yet, so dialogue models that depend on that feature are not a present target for DialogDesigner.
These examples serve to show that while DialogDesigner must support modelling of state-of-the-art tasks and dialogues, it should also be exible and extensible.
6.2 Communication
The second goal is communication with stakeholders about dialogue model design. This includes presentation of dialogue ow, simulation of dialogues, and extraction of prompts and phrases for validation and recording purposes. This goal is where DialogDesigner really distinguishes itself from other tools. There are several tools which enable e.g. WOZ simulation or visualisation of the dialogue model in terms of a graph structure. However their focus is not in particular on communication with stakeholders.
6.2.1 Presentation of dialogue ow
Many tools try to provide intuitive scripting tools, e.g. IBM WebSphere (http:// www.ibm.com/websphere), the Edify editor (http://www.edify.com), or HotVoice (http://www.dolphin.no). These tools often make it easy for non-dialogue expert
123
104 Lang Res Eval (2006) 40:87107
technicians with some programming expertise to script small, straightforward dialogues. However, they do not solve the problem of communicating with stakeholders. In fact, dialogue ow and prompts are deeply entangled in the (scripted) programs.
The drafting of dialogue snippets in DialogDesigner is somewhat comparable to Suede (Klemmer et al., 2000). But snippets are more detached from the formal model: Snippets are mapped into the model whereas the graphs in Suede become the dialogue ow directly. Moreover, snippets are allowed to be only part of a concrete dialogue, whereas Suede makes complete dialogues from the start to the end.
That being said, the graphical presentation of Suede with prompts directly in the nodes seems quite intuitive, and the ability to manipulate and edit the model via directly editing the graph, is nice. However, graphs tend to quickly become cluttered and difcult to lay out readably.
6.2.2 Simulating dialogues
Other tools than DialogDesigner exist which are meant to support the design and evaluation of SDSs and which support WOZ simulation. Two such tools are Suede, and the WOZ tool developed by (Breuer, 2006) as a by-product of his work at Nuance. Basically the WOZ facility in both cases enables the designer to select a prompt from a list of available prompts given the present state. The selected prompt is played or spoken to the user. Based on the users answer the designer selects again one among the now available prompts, etc. In Suede simulation of recognition errors is supported.
6.2.3 Lists of prompts and phrases
The ability to present phrase lists is also found in other tools. For instance, in SpeechMania all text constants in the HDDL program may be extracted to a phrase list. This is ne except that also non-phrase strings are extracted, and that there is no relationship between the phrases in the list and the prompts in which they occur. As shown in (Dybkjr & Dybkjr, 2004b) the ability to present prompts, e.g. to domain experts, may be crucial for the systems correctness. Also, some customers wish to have their management review and approve all prompts since they become an important part of the companys external image.
6.3 Efcient development of code
The third goal concerns efcient development of code. This goal includes work separation, reuse, code generation, and automatic analyses of the dialogue model.
As we have seen, DialogDesigner has some support for code development, e.g. in terms of code generation to HotVoice and automatic analyses of aspects of well-formedness. Efcient code development is core to many tools although they may have different ways in which to support it. VoiceXML tools, e.g., dont do code generation since you script your dialogues in VoiceXML. However, they may include libraries of small frequently used dialogue parts, such as obtaining a date, which makes code writing efcient. Such libraries may be seen as support for reuse. There are also tools which support you in building your own libraries. The GEMINI platform (Hamerich et al., 2004), e.g., supports reuse by allowing all developed models to be saved as libraries for reuse in future applications. SDKs normally come with a suite of tools which may be used by different people from the development
123
Lang Res Eval (2006) 40:87107 105
team (work separation) and which include support for automatic analyses of various kinds. Of course the code developed using an SDK may also be reused later, if relevant, even if there is no support for building a library.
7 Discussion and future work
We have described DialogDesigner which is a tool in support of a rapid and iterative SDS dialogue model development process. In the following we briey discuss its strengths, our experience so far, and future work. More information on DialogDesigner, including colour pictures, can be found at http://www.Spoken Dialogue.dk.
7.1 Software development process support
We have presented the three main goals of DialogDesigner and how it supports a modern iterative software development process (Sect. 2). To achieve the goals we have enabled electronic dialogue modelling (Sect. 3) and constructed a suite of tools which support the development process (Sects. 4 and 5). In Sect. 6 we discussed the goals and achievements of DialogDesigner in relation to other work on SDS dialogue model development support.
DialogDesigner clearly has its strengths in process support, in particular with respect to stakeholder communication whereas it provides state-of-the-art support regarding the two goals of contemporary dialogue complexity and efcient code development (Sect. 6). The communication support includes presentation, simulation and report facilities. HTML reports, graphical visualisation and concrete dialogue snippets may be used for presentation of the dialogue model. Walkthroughs and WOZ sessions may be used to simulate dialogues prior to dialogue model implementation and thus allow for early error correction. Lists of prompts and phrases may be extracted for validation and recording purposes.
7.2 Experience
Since DialogDesigner is quite new, we have very limited experience from using it and we have made no formal, empirical investigations of the extent to which it helps improve the development process. Possible issues to look for in order to evaluate improvements would be
better product quality; more satised customers and customer representatives; faster development process for designers and developers; sales argument used by marketing people.
Each of these issues may be quite difcult to evaluate. In fact experience from a large number of development projects with and without DialogDesigner would be the best source for a fairly reliable evaluation. This is data we dont have.
Product quality is inuenced by process quality although the connection is complex and not entirely understood. To test which difference DialogDesigner makes to product quality one would in principle need two identical development teams developing the same application under the same conditions and using roughly the same
123
106 Lang Res Eval (2006) 40:87107
process, but with one team using DialogDesigner while the other does not. A comparative evaluation of the two resulting systems could then be performed. In practice this does not work e.g. because you will never have two identical development teams. A further complication would be that the comparative evaluation could not be entirely objective since the quality models for the overall interaction with the SDS can cover only a part of the factors inuencing perceived quality (Mller, 2004, preface).
Improved customer satisfaction is also difcult to measure without two almost identical development processes as described above. Customer satisfaction per se can of course be measured but it would be difcult to tell if an improvement (or the opposite) is due to the use of DialogDesigner since there are so many other parameters that may inuence customer satisfaction.
Whether the development process becomes faster when DialogDesigner is used would again require the comparison of identical processes with and without the use of DialogDesigner or experience from many projects with and without the application of DialogDesigner. We dont have much data so all we can say is that we have a feeling that the development process with DialogDesigner involved is faster than without. One reason may be that DialogDesigner ensures a larger degree of consistence between specication, design and the actual system than we would otherwise have had. Another reason may be that systematic use of a tool like DialogDesigner encourages a systematic development process. Moreover, Dialog Designer ensures better possibilities for testing the dialogue model from early on. For example, the test-rst concept from Extreme Programming (XP) where tests are prepared before the system is implemented (see e.g. (Beck, 1999) and http:// www.testdriven.com) also forms part of DialogDesigner in the sense that the dialogue model is tested before it is implemented and scenarios for tests of the implemented system may be prepared in advance.
Thus it is our impression that DialogDesigner helps saving time because the basis for implementation is better and contains fewer errors than would be the case with a less thorough design process.
The last point on the list above may be evaluated by looking at whether the marketing people use results from DialogDesigner as part of their sales arguments because it is fairly easy to generate something which looks good. This is also a point we cannot evaluate yet.
7.3 Future work
DialogDesigner is being extended and improved when time allows and need arises. Extensions are dictated by practical needs or driven by theoretical interests. There are many improvements and additions we can think of and which perhaps will be realised at some point of time in the future. Our current primary goals encompass the following extensions, in prioritised order:
VoiceXML generation, so that DialogDesigner conforms to the mainstream platforms. This implies the need for a more abstract formalisation of conditions and actions, removing any dependence on the HotVoice platform.
Better modelling facilities. This includes the following points with the rst one having the highest priority:
creation of catalogues of dialogue patterns, including tools support for specic patterns, such as lists of options, cf. (Balentine & Morgan, 2001);
123
Lang Res Eval (2006) 40:87107 107
support for domain modelling, cf. (Klzer, 2002); more powerful generation of prompt specications; support for semantics/grammar modelling.
Further act-topic exploitation. In particular we need more experience on the strength of act-topics, cf. the rst point on the following list:
more experience on the relative strength of act-topic patterns and of snippets with their mapping into state-transitions;
a more expressive act-topic rule notation. This may for instance be done by introducing regular expression operators, such as *, +, and [ ], or variables such as offer(T) select(T);
the possibility to view the actual sequence of prompts and transitions that satised the act-topic rule pattern in case of a positive analysis result;
multiple acts in prompts. This will add to the complexity, but we need it since allowing one speech act only is e.g. not compatible with implicit feedback.
References
Aust, H., Oerder, M., Seide, F., & Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Communication, 17, 249262.
Balentine, B., & Morgan, D. P. (2001). How to build a speech recognition application a style guide for telephony dialogues (2nd ed). San Ramon, California: EIG Press.
Beck, K. (1999). Extreme programming explained. Embrace change. Pearson: Addison-Wesley. Breuer, R. (2006). Wizard of Oz tool. Technical Report latest update 2006-07-13, rst version 2001, http://www.softdoc.de/wozDybkjr, H., & Dybkjr, L. (2004a). From acts and topics to transactions and dialogue smoothness.
In Proceedings of the fourth international conference on language resources and evaluation (LREC), volume V, pp. 16911694, Lisbon, Portugal.
Dybkjr, H., & Dybkjr, L. (2004b). Modeling complex spoken dialog. IEEE Computer,August:3240
Dybkjr, H., & Dybkjr, L. (2006). Act-topic patterns for automatically checking dialogue models. In Proceedings of the fth international conference on language resources and evaluation (LREC), pp. 909914, Genoa, Italy.
Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175204.
Hamerich, S. W., de Cordoba, R., Schless, V., dHaro, L. F., Kladis, B., Schubert, V., Kocsis, O., Igel,S., & Pardo, J. M. (2004). The GEMINI platform: Semi-automatic generation of dialogue applications. In Proceedings of the 8th international conference on spoken language processing (Interspeech), pp. 26292632, Jeju Island, Korea.
Harris, R. A. (2005). Voice interaction design. Morgan Kaufmann Publishers.
Hirschberg, J., Swerts, M., & Litman, D. (2001). Labeling corrections and aware sites in spoken dialogue systems. In Proceedings of 2nd SIGdial workshop on discourse and dialogue, pp. 7279, lborg, Denmark.
Jacobson, I., Boosch, G., & Rumbaugh, J. (1999). The unied software development process.Addison-Wesley.
Klemmer, S. R., Sinha, A. K., Chen, J., Landay, J. A., Aboobaker, N., & Wang, A. (2000). SUEDE: A wizard of Oz prototyping tool for speech user interfaces. In CHI letters, the 13th annual ACM symposium on user interface software and technology: UIST, volume 2(2), pp. 110.
Klzer, A. (2002). DiaMod. Ein Werkzeugsystem zur Modellierung natrlichsprachlicher Dialoge.
PhD thesis, DaimlerChrysler AG and University of Koblenz. Berlin: Mensch & Buch Verlag. Mller, S. (2004). Quality of telephone-based spoken dialogue systems. Springer.
Rich, C., Sidner, C. L., & Lesh, N. (2001). COLLAGEN: Applying collaborative discourse theory to humancomputer interaction. AI Magazine, 22(4), 1525.
Steensig, J. (2001). Sprog i virkeligheden. Bidrag til en interaktionel lingvistik. Aarhus Universitetsforlag.
123
Springer Science+Business Media 2006