* Centre d'Etudes de la Navigation Aérienne 7 avenue Edouard Belin 31055 TOULOUSE CEDEX, FRANCE ** Centre d'Etudes de la Navigation Aérienne Orly Sud 205 94542 ORLY AEROGARES, FRANCE {chatty,lecoanet}@cena.dgac.fr
However, the need for new interfaces for air traffic control is getting more and more perceptible. Air traffic has been growing rapidly for the past ten years, and does not appear to be stabilizing. The increased workload has so far been balanced by a growth of the number of controllers (sometimes), and by growing individual workloads (most often). If the traffic keeps growing, more computerized systems and more efficient tools will be necessary. Despite the difficulties mentioned earlier, most air traffic control agencies embarked on projects based on graphical interfaces. The past ten years have been spent applying technologies and techniques from other application domains, identifying the specifics of ATC, and developing solutions when necessary.
France has such a plan, which is reaching its final development stage, and is aimed at replacing the current air traffic control workstations within the next few years. This plan, called PHIDIAS, makes heavy use of classical interaction techniques, and most of its original features are to be found in its hardware or low-level software components. PHIDIAS provides a standard WIMP (windows, icon, mouse, pointer) interface on two screens, one of which is very large. A lot of work has been spent on identifying and solving the software issues raised by such a large screen, and especially the management of many graphical objects moving independently.
If those technical issues have now been solved, a number of design choices made in PHIDIAS can still be improved, if only because hardware, interaction techniques, and our knowledge of ATC have improved over time. This article describes the design of GRIGRI (grigri is a colloquial French word for scribble), the first prototype in project IMAGINE, which is a step toward the future generation of interfaces for air traffic control. GRIGRI implements design choices very remote from current designs, and finds its inspiration in pen computing. Contrasting with the notepads that made pen computing popular a few years ago, GRIGRI does not use handwriting recognition, but only the recognition of simple gestures, drawn with a pen or a fingertip. Those gestures (marks) are performed directly on the surface of the screen, thanks to a high resolution touch-screen. GRIGRI also uses direct manipulation for some functions and makes use of sampled sounds to improve the confidence of users and to make some interaction modes more explicit. Finally, the layout of GRIGRI was designed to encourage two-handed interaction.
This article is organized as follows. The first section is devoted to a description of the ATC task and the current environment of controllers. The second section identifies several drawbacks in the use of classical user interface techniques. The third section describes related work on gesture recognition and pen based computing. The following sections describe the design choices implemented in GRIGRI. The last section reports on preliminary evaluations of the system.
Figure 1. The current radar screen
Today's radar screens are usually circular and monochrome. They provide a top view of airspace, and represent each aircraft with an icon showing its current position, smaller icons showing a few of its past positions, a line representing its speed, and a label containing its call-sign, speed, and altitude (see figure 1). This display enables radar controllers to identify problems, solve them in real time, and monitor the implementation of the solution. In this process, they also benefit from data about flight plans, that are printed on semi-rigid paper strips, called flight strips (see figure 2).
Figure 2. An annotated flight strip
The flight strips are printed by the computing systems a few minutes before the estimated entry of an aircraft in the sector. The planning controller, who assists the radar controller by planning his work and managing the communications with other sectors, grabs the strips as soon as they are printed. Each strip is then inserted in the set of current strips, and the planning controller signals possible problems through voice, gestures, or layout conventions. The strips, which feature the call-signs of aircraft, the altitude requested by their pilots, and the planned route, are then used by the radar controller, mainly as a reminder. Strips are gathered in groups according to the current situation, and annotated with the orders given to pilots, as shown on figure 2. Most annotations are codified and documented in manuals. Finally, when an aircraft leaves for another sector, the corresponding strip is thrown away, as a sign that the radar controller may forget it.
These goals, as well as the set of available techniques, have had a strong influence on the design that was chosen for the new workstations. The screen gained even more importance and its size became critical. In addition to the radar image which covers a 50 cm wide round screen in the current system, the screen also has to display the new "electronic strips", as well as the potential intelligent tools. Air traffic controllers, afraid of missing important information that would be hidden, are generally opposed to overlapping windows. It was thus decided to use screens as large as possible, namely 50 cm wide square screens, featuring 2048x2048 pixels, developed by Sony for the US FAA. But those screens are so large that users must be seated far from them if they want to see the whole contents. This immediately excluded any possibility of direct interaction on the screen, and led designers to using a mouse and classical interaction techniques: menus, buttons, double clicks, etc. Studies were then focused on information coding (colors, icons, etc) and on the software techniques needed to efficiently display and move many objects on a large screen.
On this screen, the main interactive areas are the fields of flight strips and aircraft labels on the radar image. Users may manipulate them to store a new heading or altitude or select an aircraft and display its route. All these manipulations require the designation of an aircraft or a strip, and the input of a command and its other parameters. Several commands are possible for nearly every active field (label, field of a label, part of a strip). Among the classical interaction techniques, menus seemed to be the most appropriate. But in order to minimize the number of items without having to use cascading menus, it was decided to associate a different menu with every field. For instance, figure 3 shows the menu associated to the "heading" field of a strip.
Figure 3. Manipulating a flight strip through menus.
The system that we just described is still being developed. However, several design choices have been criticized by future users. For instance, the manipulations on electronic strips are considered slow and painful. It has been observed that with today's system, a controller writes an annotation on a strip every 17 seconds [6]. With such usage frequencies, manipulation times and the degree of attention needed are very important. But manipulating menus is relatively slow, and users have to pay attention to what they are doing. Even worse, before selecting an item in a menu, a controller has to click in a zone which is sometimes very small. This increases manipulation times according to Fitt's law. In the case of flight strips, the decrease in performance is noticeable, compared to manual manipulations that are often performed in parallel with other tasks, without paying much attention. With no simple solution to such problems, a number of users ask for the status quo, and want to keep paper strips until the time when alternative tools will make them really obsolete.
Even if electronic strips are discarded, manipulation times will still be an issue for radar images, and for every tool provided by the ATC system. This leads us to exploring possible solutions to that issue, by exploring new interaction techniques. Gesture recognition and mark-based input through a digitizing tablet or a touch screen look like a promising direction. Other techniques are also explored, among which are speech recognition and computer vision. But gesture recognition seemed mature enough to be proposed to air traffic controllers for a medium-term implementation.
2D gestures, or marks, have been used in several ways in graphical interfaces. In addition to obvious applications to text input, gestures can be used to issue commands to the system. Kurtenbach has studied the issues raised by mark-based input, on the user's as well as on the system's side [8, 9]. Pie menus [3], although apparently close to traditional menus, are close to mark-based input, in that they use the orientation of gestures performed by the user. T-Cube even chains pie menus, thus associating each command to a broken line [13]. In T-Cube, menus are only displayed if the user hesitates while issuing a command. Unistrokes [7] fit between handwriting recognition and mark-based input. It models each letter with a simplified letter composed of only one stroke, in the same way as shorthand.
Pen computers, or notepads, which use the techniques mentioned above, have been popular at the beginning of the decade [10]. One of the first widely available environments was GO Corp's PenPoint. Apple's Newton uses handwriting recognition and mark-based input, combined with high quality feedback. But handwriting recognition accuracy often determines how such systems are accepted by users [5], and it is still difficult to achieve a very high accuracy, especially in countries where cursive writing is dominant, like France. Graffiti, a commercial variation of Unistrokes by Palm Computing, improves the accuracy and thus the usability of such systems at the cost of learning a simplified alphabet.
Finally, it is interesting to note that French air traffic controllers have been familiar with touch-screens for decades. The Digitatron, introduced in the 60s, is a touch sensitive alphanumeric screen with a low resolution. It gives access to a videotext system, used for infrequent operations such as checking which military zones are activated or modifying a flight plan.
This prototype of flight strip management was soon enriched with a simplified radar image manipulated with gestures, so as to check how well the technique could be applied to interfaces where the use of gestures was not as obvious as for flight strips. This radar image was built with no button or menu, and every command was performed through gestures. There were gestures for zooming and panning, and for several operations on aircraft and beacons.
The results of this pilot study were very encouraging, as well as full of lessons. The lessons were related to the handling of errors, the acceptance of the system by users, and the type of applications that could easily take advantage of gesture recognition.
We used the lessons learned with this first prototype when developing the prototype described in the rest of this article. First, we investigated the set of available hardware allowing direct interaction on the screen, either by projecting images on a digitizing tablet (like Rank Xerox EuroPARC's Digital Desk [14]), or by using touch-screens. Second, finding no simple solution for moving flight strips around, we decided to concentrate on radar images, leaving flight strip management for future research. Finally, we paid more attention to feedback during interaction.
Figure 4. The physical setup of GRIGRI
The physical setup of our prototype uses a touch-screen. It includes a color screen built in the surface of a desk with an angle of approximately 30 degrees (see figure 5). The screen is covered with a high resolution touch sensitive layer, that can be used with a pen or a fingertip. It is used to display a radar image and a bar of controls, as shown on figure 5. Depending on their type, commands are issued through the bar of controls, through gestures on the radar image, or through longer direct manipulation interactions.
Figure 5. The (reduced) screen of GRIGRI
With mark-based input, another choice can be made: do not specialize fields, but have several marks for the same operation, and have each mark code for a field. For instance, instead of having a mark for the command "modify a field", that can be applied to any field, let us have a mark for "modify speed", a mark for "modify flight level", and so on, with the meaning of a mark being independent from the location where it is issued. This design choice yields a growth of the number of commands that must be learnt (about 15 different marks). But it dramatically lowers the constraints imposed on users, because gestures can be started anywhere on the label, and their sizes may vary much more. This choice allows us to expect a better efficiency in interaction, at the expense of some training, which is acceptable for professional users such as air traffic controllers.
However, even if we were confident that specific gestures were a good solution, we also implemented generic commands as an alternative way of performing operations. Those generic commands (modify, highlight) are interpreted according to the field on which they are performed. Doing this, we allowed users to choose the interaction style they preferred, thus getting some hints about whether our reasoning was right. The set of all possible gestures is shown in figure 6.
Figure 6. The set of gestures recognized.
To recognize gestures, GRIGRI uses Rubine's classifying algorithm. This algorithm can classify gestures composed of a single stroke, and involves an individual training period. For each class of gestures, each user provides about fifteen examples, using a tool described later in this article. During this training, the geometrical features of each gesture are extracted, and are used to produce a dozen features that are significant of the class. Later, when using the system, the same features are extracted from every gesture drawn, and compared to those of the defined classes. The classifying algorithm identifies the class whose features are closest, or yields a recognition failure if the confidence ratio is too low. Classification is very fast (a few milliseconds), and response times are not perceptible by users. Having noticed that recognition failures had to be quickly and clearly signaled to users, we had to devise appropriate feedback. Considering that the visual channel was already very busy, we added sound capacities to GRIGRI. Two sampled sounds are used: one signals recognition failures, and the other signals gestures that are correct but meaningless at the place where they are performed. However, we did not use sound for successful operations yielding an immediate visual feedback, so as to restrict the use of sound to situations where it is necessary.
To enter a flight level, one needs to specify an aircraft, the operation itself, and the new value. These data are entered as follows: Like other operations, the command begins with a gesture on the label of the appropriate aircraft. Then GRIGRI opens a window that allows the user to enter the new flight level. In many regions of airspace, only a few values are acceptable (330, 350, 370, 390, for instance), and PHIDIAS proposes to enter the new value through a menu, allowing itself to propose a default value. However, a number of air traffic controllers have to deal with many more possible flight levels, which makes this solution less acceptable. What we implemented in GRIGRI is the recognition of hand written numbers, in order to compare the efficiency of the two techniques. The input window has its own gesture classifier, independent from the one which is used in the radar image. It recognizes the ten digits (drawn with a single stroke) as well as gestures for correction and validation. The input window is fairly large, so as to let users write at their preferred size. In order to avoid masking a large portion of the radar image for tool long, this window is semi-transparent, after Xerox's see-through tools [2] (see figure 7).
Changing the zoom factor involves a more classical type of interaction. The user first needs to go into zoom mode; this is done with a gesture, or by pressing a button in the bar of controls on the side of the screen. Then, the image is stretched or condensed by moving the pen outward or inward; the system goes back to normal mode when the user lifts the pen again. The two ways of starting the operation have been implemented because we hoped to identify whether users preferred to use a button or a gesture. We also hoped to induce users into using forms of two-handed interaction: the bar of controls is located on the side of the non-dominant hand, while the pen is usually held in the dominant hand. This is a situation where using two hands may be useful: the non-dominant hand rests on the border of the screen and presses buttons on demand, while the other hand stays where the attention of the user is focused, and keeps holding the pen. Even though it is impossible to perform parallel actions with current touch-screens (which might be a problem, as described in [4]), we wanted to assess whether a purely sequential operation was possible.
Figure 7. Entering a flight level
Finally, conscious that long interactions place the system in modes that should be made explicit to the user, we decided to use sounds. When the user starts a long interaction, or when the action is terminated, the system produces a fairly long sound of the kind which is usually associated to transitions, teleportations in science fiction movies, for instance. Though we have not yet planned to test the usefulness of such sounds, we believe they have a significant role in the perception users have of the results of their actions. However, air traffic controllers are still very cautious about sound, and this will need to be studied more closely.
Our system uses the X Input extension to the X Window System distributed by MIT, which had to be modified to suit our needs. With these extensions installed, the touch-screen replaces the mouse for every purpose. Gesture recognition is based on Rubine's algorithm, in its C version. The software layers that feed positions to it had to be modified to account for the slight unevenness of the screen that caused the pen to bounce. Rebounds were interpreted as the end of a mark and the start of a new one, which led to many interpretation failures. We introduced a time-based filter that removes most of those parasitic events.
GRIGRI was built using a hybrid prototyping environment developed at CENA. This environment is based on a public domain Scheme interpreter: time-intensive functions are programmed in C and exported as primitives of our environment, while most of the system is programmed in Scheme. This approach, similar to that of Tcl [11], combines the efficiency of C with the flexibility of an interpreted language, and proved very useful for prototyping. In GRIGRI, the most important primitives used are the gesture recognition algorithm and a graphical widget dedicated to animated interactive images.
The second phase of the project, which is currently being carried out, is done with volunteer air traffic controllers. It is a first evaluation of how the system is accepted by controllers, as well as a means of getting data on how well the system works and how it is used. The obtained data will then be interpreted and used to improve the system, if it is considered promising. Other goals will then be set for the next evaluation phase, so as to understand better and better how this technology can be used in air traffic control systems. GRIGRI is evaluated by having it used in semi-realistic conditions by a radar controller. Simulated traffic is fed to the system and displayed on the radar image, while the controller is linked to pseudo-pilots through a simulated radio link. Measurements begin after one hour of simulation. What is observed is the way the radar controller interacts with the workstation. Some data is obtained by observing users and conducting interviews after every simulation. But the most objective data is obtained through the system itself, which is instrumented to output it. Among those data are the number of times every command is used, the frequencies of several types of errors, manipulation times, and even the trace of every gesture. Some errors which are difficult to detect automatically (errors on digits, for instance) are registered by an observer through the keyboard of the workstation.
This second phase involved 12 volunteer air traffic controllers. The data obtained during this phase is currently being interpreted, and only rough data are available. The first results, however, are very encouraging. The perception of the system by users is generally good, and they consider it as a potential alternative to classical interaction techniques, if it can be made reliable. The figures are encouraging as well. Out of a total number of 5200 marks drawn by the 12 users, we observed about 5% of gestures out of context, 5% of interpretation failures, and 1.3% of interpretation errors. Of course, these numbers will have to be interpreted, and the nature of interpretation errors and their impact will be studied carefully. But they prove that mark-based input is an option that can be considered further in the context of air traffic control. We already have indications that these figures can be improved easily. For the first six users, for instance, parallax errors accounted for half of the marks drawn out of context. Those parallax errors have then been partially corrected by applying a geometrical transformation to the coordinates provided by the touch-screen. They would even probably disappear with a more appropriate device, using overhead projection for instance. Similarly, half of interpretation failures have been identified as caused by technical problems that we should be able to solve (rebounds still poorly handled, or insufficiently sampled gestures). After devising and applying solutions, we will still have to determine whether the resulting error rates are satisfactory, or if they can be further improved by other means. However, even with the current 90% accuracy, our users accepted the system well, probably because of the low cost of mistakes: redoing a mark is fast enough.
Although we have gathered a lot of data that will help us to improve the system (for instance by taking into account the probability of each command according to the context), our experiment will not allow us to evaluate all the design choices made in GRIGRI. For example, we will have to determine whether users discarded generic gestures (such as the straight line associated to "modify") because they forgot about their existence, as they put it, or because of other reasons. Similarly, there were too few zoom actions (two) to gather any evidence about two-handed interaction. Finally, we now need to measure manipulation times with PHIDIAS in order to be able to compare them with the ones we obtained with GRIGRI.
Whereas some expected results are missing, we also made unexpected observations, that will have an influence on future research. For instance, we discovered that controllers liked the fact that two of them were able to share a screen, talk over it, and interact with it one after another without having to exchange mice. This opens new perspectives in the management of the collaboration between a radar controller and a planning controller. Such properties of the system will be studied in future phases of experimentation. Doing this, we hope to progress in our understanding of mark-based input while proposing a system that will be more and more usable.
2. E. Bier, M. Stone, K. Fishkin, W. Buxton, and T. Baudel. A taxonomy of see-through tools. In Proceedings of the ACM CHI, pages 358-364. Addison-Wesley, 1994.
3. J. Callahan, D. Hopkins, M. Weiser, and B. Shneiderman. An empirical comparison of pie vs linear menus. In Proceedings of the ACM CHI, pages 95-100, 1988.
4. S. Chatty. Extending a graphical toolkit for two-handed interaction. In Proceedings of the ACM UIST, pages 195-204. Addison-Wesley, Nov. 1994.
5. C. Frankish. Recognition accuracy and user acceptance of pen interfaces. In Proceedings of the ACM CHI, pages 503-510. Addison-Wesley, 1995.
6. C. Garoff-Mercier and M. Chollet. Analyse de l'activité des contrôumflex;leurs du trafic aérien : utilisation des vecteurs d'information et des communications. Technical Report R90-07, Centre d'Etudes de la Navigation Aéerienne, 1990.
7. D. Goldberg and C. Richardson. Touch-typing with a stylus. In Proceedings of the ACM CHI, pages 80-87, 1993.
8. G. Kurtenbach and T. Baudel. Hypermarks: issuing commands by drawing marks in Hypercard. In CHI'92 Posters and Short Talks, page 64, 1992.
9. G. Kurtenbach and W. Buxton. Issues in combining marking and direct manipulation techniques. In Proceedings of the ACM UIST, pages 137-144, 1991.
10. A. Meyer. Pen computing. A technology overview and a vision. SIGCHI Bulletin, 27(3):46-90, 1995.
11. J. K. Ousterhout. Tcl and the Tk toolkit. Addison-Wesley, 1994.
12. D. H. Rubine. The automatic recognition of gestures. PhD thesis, Carnegie Mellon University, 1991.
13. D. Venolia and F. Neiberg. T-Cube: a fast, self-disclosing pen-based alphabet. In Proceedings of the ACM CHI, pages 265-270, 1994.
14. P. Wellner. Interacting with paper on the Digital Desk. Communications of the ACM, 36(7), July 1993.