Keywords: Interface Agents, Interactive Graphical System, User Adaptation, Multimodal Input, Open Input.
Therefore, techniques have to be explored and realized which enhance and simplify the interaction with a virtual environment in the way that users can be relieved from technical detail and can concentrate on their primary tasks.
In the last decade, interface agents have become prominent as a new paradigm for the design of more intelligent user interfaces [2, 6]. By mediating a relationship between the technical system and the user, they allow more human-like communication forms and, so, can add comfort in human-computer interaction [3, 5, 7]. Maes [5] has realized personal assistants, e.g., for electronic mail handling and electronic news filtering which accumulate knowledge about tasks and habits of their users to act on their behalf. In the VIENA project [7], we consider the manipulation of objects in a virtual office by using simple natural language input. A multiagent interface system is used as a mediator for translating abstract verbal user commands to quantitative, technical commands which are used to update the visualized scene.
My Ph.D. thesis research contributes to the work in the VIENA project. To achieve further enhancement in the interaction with a virtual environment, I investigate three main aspects: adaptation to user preferences, multimodal input, and open and underspecified input. I am using agent-based techniques to approach my solutions. The next three sections will describe these aspects in more detail.
In our approach, we consider a system of multiple interface agents which adapts to user preferences by learning from direct feedback without explicit acquisition of user data. Avoiding explicit user modeling seems a desirable goal because explicit user models have found critique with respect to privacy of user data [6]. The core idea of our approach to implicit user adaptation is that agents of the same type but slightly different functionality, corresponding to possible variations of users' preferences, organize themselves to meet the preferences of the individual user. Getting positive or negative feedback from the user, agents increase or decrease their amount of selfconfidence, so that successful agents become dominant in the ongoing session.
From the system internal point of view, the adaptation process is achieved by a form of reinforcement learning [1]. Learning is realized in a way that the system will take actions that maximize the reinforcement signals received from the environment. In our approach, this means that users' instructions (or corrections, resp.) represent reinforcement signals which are interpreted and encoded by the interface agency in the form of credit values. Each agent stores a credit value corresponding to its quality ("strength") at discrete periods of time. Learning is achieved by adjusting agents' credits in correspondence to the users' feedback and assigning those agents which are eligible for the task in question and which have maximal credits.
A prototype version of the adaptation method described above has been implemented and tested for the case of users' preferences for different spatial reference frames. By using simple heuristics, adaptation to varying users' preferences for different spatial reference frames can be achieved. For more detailed information, see [4].
Whereas several multimodal systems, realized so far, concentrate on methods for the generation and presentation of multimodal output, we focus on the integration of multimodal input. To communicate instructions to the graphical system, natural language input and simple hand gestures indicating a direction can be used.
The problem of integrating informations of these two modalities into one multimodal input should be solved by a multimodal input agency. This agency consists of several mode-specific input agents, i.e., a speech listener agent and a gesture listener agent, a global input data structure, and a coordinator input agent. The listener agents are responsible for receiving and analyzing the sensor data and for sending them to the coordinator input agent which stores all incoming data in the global input data structure.
To integrate the gestural and verbal inputs, the coordinator input agent has to decide which gesture belongs to which verbal input. In our approach, we want to achieve the synchronization by processing in time cycles which is motivated by temporal control mechanisms of humans. The gesture and verbal input of the user are interpreted as belonging together if they are perceived by the listener agents in the same time cycle. In this way, intuitive interaction modalities could be used simultaneously.
In our setting, I want to use a combination of time-oriented and event-oriented techniques within the listener agents to decide when the processing of instructions can begin. In addition, agents should use knowledge which is obtained from previous interactions to determine missing informations.