Home Consortium Research Activities Deliverables Publications Links Contact

  Research Activity 1: Multi-modal dialogues

'...For some service robots, it is sufficient to execute commands given by a human instructor, while a robot companion must show a very natural communicative behaviour in oder to become highly accepted by humans ...'


Partners involved

Objectives

Results

Final review presentation

Videos

Deliverables


Partners involved(RA leader)

UniBi,KTH, UH and LAAS

Objectives

Spoken language is generally considered the most natural means of communication for humans. However, in real-life situations, speech will always be augmented by other modalities, such as gesture, mimic or gaze. Moreover, communication frequently incorporates visual or haptic information about the current situation. Single modality dialogue restricted to speech only can be considered as an artificial way of communication induced by telecommunication technologies. This might be appropriate for human-computer interaction as long as the communication partners do not need to refer to a common environment. In embodied communication, however, both communication partners act in and are able to manipulate a shared situational context. As a consequence the reference to a common environment plays an important role in the communication, which therefore will be multi-modal.

A robot companion is designed to operate in an environment populated by humans. It interacts with persons in their vicinity. Its communication capabilities must match those of its human peers as closely as possible. Therefore the robot must have the ability to engage in a multi-modal dialogue integrating speech and gestures, as well as visual and haptic information.

In that context, the goal of this research activity will be to develop conceptual models, investigate algorithmic solutions and implement and evaluate prototypical systems exhibiting the fundamental capabilities of multi-modal dialogue.

Work conducted in RA1 is related to Key Experiment 1 (The Robot Home Tour) but also related to the other Research Activities.

 

Results

WP1.1 Adaptive multi-modal dialog

The main focus of WP1.1 was the design and the implementation of adaptive dialogue strategies based on an appropriate user model. The goal of this effort is to increase usability and acceptability of the system to different types of users. Our approach follows the well established general adaptation schema proposed by Jameson. The central construct in this schema is the user model that specifies characteristics, needs, preferences and goals of users that are typical for an application. The content of the user model is often defined in terms variables, the values of which differ between users. The major innovation of our approach is the broad coverage of the user model and the system's ability to vary its initiative behaviors. The main components of the adaptation scheme are summarised in the following.

  • User model
  • The user model specifies characteristics of users concerning their interaction expe- rience and preference. In our current implementation, each user is modeled by three variables: experience level (and interaction frequency), help-seeking type and interaction type, which ad dress the issue of task success, interaction control and interaction pleasure, respectively.
  • Information collection
  • The following information from the user is collected: (1) the inter- action frequency of a given user (for user experience level and user interaction frequency); (2) whether the user follows system's task-related initiatives, that is, whether she reacts positively to help given by the system (for help-seeking type); (3) whether the user uses phrases that are typical for human-human interactions, e.g., "Thank you", "Sorry", "How are you?" (for inter- action type).
  • User model acquisition
  • Once a user is identified at the beginning of the interaction, her values for experience level and interaction frequency are increased.
  • User model Application
  • During an interaction, the dialog system interpretes interaction situ- ations depending on the values of the user model. Some of the interaction situations require an online evaluation of the user's and the system's performance. An indicator for per- formance problems of the user are the number of illegally proposed tasks. Indicators for robot performance problems are perception problems such as speech and gesture recognition errors.
  • Prediction or decision about the user
  • The behavior variations of the system concern the con- tent and the frequency of the system's initiatives. Four attributes are specified for each initiative: Type (task-related or social initiatives), Group (min, max, or med, indicating the frequency of initiative that should be used for the current user), Trigger (the interaction situation, in which the system has to make a decision about taking the initiative) and Priority (priority of this initia- tive, relevant in case of multiple possible initiatives). Given a trigger and the current user profile (based on the values of the user model), the system makes a decision on whether to take an ini- tiative of either type and/or what initiative to take.

 various Mindi display

The figure shows the different concepts assigned to the various Mindi display

Our adaptation approach is a powerful mechanism and its benefits include the following three points. Firstly, the three variables of the user model address both aspects of usefulness and social ability of a robot companion and enable them to be operationalized with clearly defined system behaviors. Secondly, the parameters of initiative frequency softens the adaptation behavior of the system and reduces user confusion caused by variable system behavior (which is a common problem for adaptive systems). Thirdly, the system can be implemented in a way that the major behavior control is sourced out to configuration files so that modifications can be easily carried out.

Observations from evaluation: Evaluation mainly focused on the adaptation of the speech input control which turned out to be very successful as compared to the previous approach. While without an adaptation scheme only 63% (74% after user instruction via a demo-video) of the user utterances were processed by the system, the adaptive input control increased the overall processing rate to 96% while the false positive rate did not increase significantly. Further noteworthy results concern the adaptation of the system's task-level initiative. Although the adaptation scheme foresees different levels of information provided to the user depending on the user's type, it turned out that regardless of the user type, the robot has to provide much more specific infor- mation in many situations. We therefore designed and implemented a much more explicit grounding strategy. For example, the verbal feedback concerning the instruction for the user after a (system or user initiated) system reset now specifies the time the user should wait before s/he continues the inter- action ('Wait 5 seconds after the reset and then say hallo again'). This change was added due to the fact that users would wait for several minutes for a signal from the robot after the reset, apparently expecting a complete system re-boot. Also, the virtual robot character 'Mindi' which provides non-verbal feedback has become more active in order to display internal system state information in more detail. The virtual robot character Mindi displays relevant system states (such as 'following', 'processing', 'having technical problems', 'not understanding', 'having performance problems' and 'being surprised') as well as important perceptual states (such as 'seeing no person', 'seeing one person', 'seeing two persons' etc.).

Lessons Learned: In general it can be stated that in more task-oriented interaction sequences the system feedback needs to be more specific regardless of the user type. However, analyses of the users' reactions to the sys- tem's feedback indicate that there are situations where the goal of maintaining mutual understanding is neglected in favor of keeping the communicative flow up. Such situations arise when the interaction is more socially-driven, for example during the greeting or good-bye phases. In these phases, users do not care whether or not the system does understand or reacts in an unexpected way. If this happens users will simply proceed with the interaction. Thus, the expectations of the user play an important role in the interaction and in the acceptance of the system's feedback and will therefore be a necessary focus of future research.

WP1.2 Representation and integration of knowledge for an embodied multi-modal dialogue system

Handling of multiple-users situations We implemented a mechanism for the system to be able to correctly make the decision whether to process speech input forwarded by the speech recognizer, which is important for the robot to achieve situation awareness. The dialog system draws on three different parameters to make this decision: the content of the speech input, the input from the person detection module and the dialog expectation. The graded decision making process of this approach has the advantage that different levels of restrictions can be adopted in different situations.

Integration of planning with dialog modeling The objectives was (1) to investigate integration of a dialog component with a planning module and to realize the link between the robot supervision system (Shary) and the high-level HATP planning system, to implement a dedicated communication scheme that supports robot task achievement in a human context and (2) finally to integrate these features in the KE2 experiment. We have considered three challenges: (A) Starting and ending the inter- action, (B) robot-initiated plan-modification, and (C) user-initiated plan-modification. These challenges have been addressed in the following way. Note that complementary information concerning integration can be found in the RA6 and RA7 deliverables.

Snapshot during communicating spatial hierarchy scenario

Snapshot during communicating spatial hierarchy scenario

Integration of dynamic topic tracking with dialog modeling In order to demonstrate situation awareness with respect to the semantics of the user's utterances we integrated an approach to dynamically track topics in situated human-robot interactions. By dynami- cally we mean that neither the topics nor the amount of topics are predefined but adjusted dynamically during the interaction. This enables the system to draw simple inferences without explicit reasoning and without a pre-defined knowledge-base by associating words and objects that 'belong', that is co-occur together. The topic tracker has been integrated with the dialogue module and has been evaluated offline (pre- cision and recall of topic classification) and online. It was shown that subjects were sensitive to the robot's ability to detect connections in the discourse.

Integration of localisation with dialog modeling This concerns our work on spatial representation by integrating the localisation approach from KTH with the dialogue. Based on experiences from user studies in RA1 and RA3 a mapping hierarchy was defined where spatial points are represented as either rooms, areas or objects, depending on how users refer to them and what role they play in further interactions. Our hypothesis is that it will be easier for users to interact with a robot when they have similar representations of the spatial environment. Further publications within this WP in the third phase concern results from work accomplished mainly in the second phase of the project.

WP1.3 Evaluation methods

The evaluation activities that are related to RA1 started by the Wizard-of-Oz study performed in 2004. This was an evaluation of an integrated system with the aim of putting users into a context that can be characterized as "future use", i.e., the users were interacting with something that we aimed to build in the years to come. The result of the analyses informed the design of the system development. Subsequent evaluation activities have been targeting specific dialogue design issues. In the last phase we evaluated and refined the design of the BIRON robot which is used to instantiate the Home Tour scenario (KE1) during three use session, and analyses, followed by a refinement of the system. We have also used a video-based approach to measure attitudes towards different communicative robot behavior.

Evaluation activities in RA1

Evaluation activities in RA1. Figure are based on evaluation activities during the project

Lessons learned: Interaction

  • Users expect natural language to be used consistently
  • Users expect the system to appear human-like when it uses natural language to provide information. There was one issue that illustrates this more than anything. According to the dialogue model of the system, which was based on a grounding and implemented using a finite-state approach, the user was supposed to say "stop" to end the ongoing activity. The situations that occurred as a consequence of this dialogue design appeared slightly peculiar or inconsistent to most users. This means that even if the system acts corresponding to it's algorithm, and even if the human adapt to the model, the interaction appear illogical. The effort of adding transparency by mapping dialogues closely to the inner workings and states of the dialogue system requires that what becomes visible to the user also makes sense.
  • Users' assumptions about human-like qualities
  • We have learned that there are some things that users seem to have a hard to understand or accept. In the evaluation of the Home Tour scenarios we have a robot that has a linguistic expressivity that is at the same level of advancement as a human. Even if we are not comparing the robot with humans per se human-like qualities may raise expecta- tions or trigger models of understanding that might cause unforseen use behaviors. This means that we cannot se a robot only as a machine equipped with some verbal components e.g., speech recog- nition and speech synthesis. When we trigger the affordances, or mental model, of something that is capable of using natural language we need to consider what this entails when some aspect that we usually find in people is not present. Even if users appear to understand the robot as being a robot with limited perceptual capability, they have a very vague idea of how the technology works, but more importantly, they seem to assume that perceptual capabilities for a robot works in a similar manner as for a human.

Lessons learned: methodological considerations

  • Testing the right thing
  • When designing a system that incorporate real and wizard-simulated components we need to take care that we do not add functionality that overshadows the properties of the system we are about to test. In other words, we need to establish that the contribution to the behavior of the robot from a specific component is tested rather than some other component that we are not interested in.
  • Evaluation of subconscious design elements
  • A design that aims to activate, or provides cues for, perceptual processes of users are not aware of at a conscious level provides a challenge when it comes to evaluation methodologies. It is hard to use qualitative methods, such as questionnaires and interviews to assess the attitudes towards components that are not not in focus for the user during the use session. Especially when we do not want to influence the users so that they become self conscious and act upon their own communicate behavior.
  • Expectation management
  • Another theme with a similar effect was the adding of the ad-hoc capability of recognizing and uttering the name of the user in the initial greeting phase of the KE1 evaluation. We have established in the interviews that the users seemed to appreciate this feature of the system. There seems different ways of understanding this positive feedback. We can interpret this as a positive evaluation of a "future feature" of the system. We then bring this in to the implication for the next step in the design. This feature might cause raised expectation and an overly optimistic picture of what the particular system that is being evaluated is capable of.
  • Providing appropriate boundaries on interaction
  • It is very important to provide sufficient instruction to allow the user to become involved in the sce- nario. There also has to be sufficient room for user initiative and improvisations. In practice this means to include support for providing boundaries for the user. Ideally when boundaries are in place, the user can act within the scenario without arriving at a state in the interaction where it is unclear what the possible actions are.

Deliverables

2004 - RA1.1.1 Report on the declarative dialogue model and strategy
2004 - RA1.2.1 Report on the formalism of the modality integration scheme
2004 - RA1.3.1 Report on the evaluation methodology of multi-modal dialogue
2005 - RA1: Multi-modal dialogs
2006 - RA1: Multi-modal dialogs
2007 - RA1: Multi-modal dialogs

Final review presentation

RA1 presentation(by Britta Wrede)
associated videos:

Videos


The beginning of this video shows the first user study, carried out as a Woz study in RA1 at KTH, with a simulation of the state-based dialog in the first year of the project. The second part of the video shows the current state of the dialog system within the KE1 scenario (Home Tour). It shows that by integrating more functionalities (e.g. spatial mapping and localization) and using a real system instead of a simulated one much more interaction management is necessary. This is due to the fact that many technical issues could not be foreseen in the initial Woz study, such as the turning-around of the robot for scanning the room, or getting stuck in the doorframe (not shown here).

References

Anthony Jameson. Human-Computer Interaction Handbook, chapter Adaptive Interfaces and Agents. Erlbaum, Mahwah, NJ, 2003.

Related publications

Below are only listed some of the RA1-related publications, please see the Publications page for more.

  • A study of interaction between dialog and decision for human-robot collaborative task achievement, Aurelie Clodic, Rachid Alami, Vincent Montreuil, Shuyin Li, Britta Wrede, Agnes Swadzba. RO-MAN 2007, Jeju Island, Korea
  • The Need for a Model of Contact and Perception to Support Natural Interactivity in Human-Robot Communication, Anders Green. RO-MAN 2007, JeJu Island, Korea
  • Characterising Dimensions of Use for Designing Adaptive Dialogues for Human-Robot Communication, Anders Green. RO-MAN 2007, JeJu Island, Korea
  • Multi-Modal Interaction Management for a Robot Companion Shuyin Li. PhD Thesis, Bielefeld University, 2007
  • Why and how to model multi-modal interaction for a mobile robot companion Shuyin Li, Britta Wrede. AAAI Spring Symposium 2007 on Interaction Challenges for Intelligent Assistants (best paper award)
  • Evaluating extrovert and introvert behaviour of a domestic robot - a video study Manja Lohse, Marc Hanheide, Britta Wrede, Michael L. Walters, Kheng Lee Koay, Dag Sverre Syrdal, Anders Green, Helge Huttenrauch, Kerstin Dautenhahn, Gerhard Sagerer, and Kerstin Severinson-Eklundh. RO-MAN 2008
  • Try Something Else. When Users Change Their Discursive Behavior in Human-Robot Interaction Manja Lohse, Katharina Rohlfing, Britta Wrede, Gerhard Sagerer. ICRA 2008
  • Interaction Awareness for Joint Environment Exploration Thorsten P. Spexard, Shuyin Li, Britta Wrede, Marc Hanheide, Elin A. Topp, Helge Huttenrauch. RO-MAN 2007, JeJu Island, Korea
  • Making a Case for Spatial Prompting in Human-Robot Communication A. Green, H. HŸttenrauch. Workshop at the Fifth international conference on Language Resources and Evaluation, LREC2006, Genova, May 22-27.
  • Measuring Up as an Intelligent Robot - On the Use of High-Fidelity Simulations for Human-Robot Interaction Research A. Green, H. HŸttenrauch, E. A. Topp. PerMIS'06, Performance Metrics for Intelligent Systems, The 2006 Performance Metrics for Intelligent Systems Workshop, Gaithersburg, MD, August 21-23, 2006.
  • Developing a Contextualized Multimodal Corpus for Human-Robot Interaction A. Green, H. HŸttenrauch, E. A. Topp and K. S. Eklundh. Fifth international conference on Language Resources and Evaluation, LREC2006, Genova, May 22-27.
  • Robust speech understanding for multi-modal human-robot communication S. HŸwel, B. Wrede and G. Sagerer. 15th Int. Symposium on Robot and Human Interactive Communication (RO-MAN) 2006.
  • A dialog system for comparative user studies on robot verbal behavior S. Li, B. Wrede and G. Sagerer. 15th Int. Symposium on Robot and Human Interactive Communication (RO-MAN), 2006.
  • Position Paper S. Li. 2nd Young Researchers' Roundtable on spoken dialog systems In conjunction with Interspeech, 2006.
  • Why and how to model multi-modal interaction for a mobile robot companion S. Li and B. Wrede. AAAI Spring Symposium on Interaction Challenges for Intelligent Assistants, 2007.
  • A computational model of multi-modal grounding for human robot interaction S. Li, B. Wrede and G. Sagerer. ACL SIGdial Workshop on discourse and dialog In conjunction with CoLing/ACL, 2006.
  • BIRON, whats the topic? - A Multi-Modal Topic Tracker for improved Human-Robot Interaction J. F. Maas, T. Spexard, J. Fritsch, B. Wrede, and G. Sagerer. Proc. IEEE Int. Workshop on Robot and Human Interactive Communication (RO-MAN), 2006.
  • BITT: A corpus for topic tracking evaluation on multimodal human-robot-interaction J. F. Maas and B. Wrede. Proc. Int. Conf. on Language and Evaluation (LREC), 2006.
  • Towards a multimodal topic tracking system for a mobile robot J. F. Maas, B. Wrede, and G. Sagerer. Proc. Int. Conf. on Spoken Language Processing (Interspeech/ICSLP) 2006.

 

An Integrated Project funded by the European Commission's Sixth Framework Programme