Muneo Kitajima
National Institute of Bioscience and Human-Technology
1-1 Higashi Tsukuba Ibaraki 305, JAPAN
Tel: +81 (298) 54-6730
E-mail: kitajima@nibh.go.jp
Peter G. Polson
Institute of Cognitive Science
University of Colorado
Boulder, CO 80309-0345, USA
Tel: +1 (303) 492-5622
E-mail: ppolson@psych.colorado.edu
This paper describes a comprehension-based model of how experienced Macintosh users learn a new application by doing a task presented as a series of exercises. A comprehension mechanism transforms written instructions into goals that control an action planning process proposed by Kitajima and Polson [11]. The transformation process is based on a theory of solving word problems developed by Kintsch [8,9]. The comprehension and action planning processes define constraints on the wording of effective instructions. The combined model is evaluated using data from Franzke [3]. We discuss implications of these results for Minimalist Instructions [1] and Cognitive Walkthroughs [17].
cognitive theory; display-based systems; exploration
The goal of this article is to describe the LICAI model*, which extends the Kitajima and Polson [11] action planning model of skilled, display-based, human-computer interaction to account for learning by exploration. Learning by exploration involves discovering how to do a novel task by generalizing from past experience or by searching successfully for a correct action sequence. The LICAI model describes how experienced Macintosh users accomplish a task using a new application by doing a series of exercises. These users are familiar with the standard Macintosh interface conventions. They must combine their existing knowledge of the interface conventions with the task described in each exercise to generate the actions required by the new application to perform each task.
The LICAI model incorporates processes from Kintsch's [8,9] theory of text comprehension. The LICAI model uses comprehension strategies to transform written instructions into goals. In this paper, we propose three comprehension strategies in the form of schemata. Schemata are specialized knowledge structures whose slots are filled with crucial elements extracted from the instructions. The output of the comprehension processes are goals that control the action generation processes.
The LICAI model uses the action planning processes developed by Kitajima and Polson [11]. Their action planning model can simulate the behavior of skilled users interacting with a graphing application hosted on the Macintosh. The comprehension-based model of goal formation processes and the model of action planning processes are combined to describe the behavior of experienced Macintosh users learning a new application. The LICAI model defines constraints on the wording of effective instructions.
The LICAI model is evaluated using data from Franzke [3], who had experienced Macintosh users do a graphing task using one of three different graphing applications with which they had no prior experience. The graphing task was presented to users as a series of exercises. The instructions for each exercise contained no information about the action sequences required to complete each subtask. Franzke's [3] experimental task is similar to actual learning and use of an application. Users formulate tasks for themselves or are given written specification for a new task.
Franzke [3] found large variations in difficulty of subtasks both within and across different applications. We argue that Franzke's [3] participants had to comprehend the goal of each exercise and then infer, or try to discover by exploration the action sequence that would accomplish the goal. The goal formation processes attempted to build specialized goals required by Kitajima and Polson's action planning model [11] to generate correct actions for a subtask. These goal structures provided links between users' understanding of a subtask and the low-level details of the interface to the new application, (e.g., menu labels).
Building these linking goal structures is a difficult and highly specialized comprehension task, analogous to processes required to successfully solve word problems studied by Kintsch [8,9]. The goal formation processes can fail. Users may not have all of the necessary comprehension strategies and/or knowledge about the task or about the application interface. Our theoretical analysis of Franzke's [3] results refines and extends her analyses, which were based on the model of learning by exploration [17] underlying the Cognitive Walkthrough [18,19].
Kitajima and Polson's [11] model of action planning is synthetic in that it attempts to integrate the views of numerous researchers on the nature of display-based, human-computer interaction (e.g., [7]), theoretical ideas about the nature of display-based problem-solving (e.g., [13]), action planning [15], and task and device representations [16].
There are two core ideas underlying the action planning model. First, Kitajima and Polson [11] proposed
that display-based HCI is analogous to text comprehension. In reading text, readers use large amounts of
knowledge to comprehend the meaning of texts. In display-based HCI, users must comprehend the
display and then select appropriate actions with the help of knowledge about the interface, the task to be
performed, and so on. Second, the model is mapped onto Hutchins et al.'s [7] analysis of direct
manipulation based on their action theory framework. This framework describes action planning as a goal
driven process that evaluates the consequences of the last action and then generates the next action to be
executed.
(see Figure 1)
The construction-integration cycle is a two-phase process. In the first phase, a network of propositions is created that contains possible alternative meanings of the current sentence or fragment. The construction process generates an associative network whose nodes are propositions representing the input text, the meanings of words in the input text retrieved from long-term memory, the current context, and the reader's goals. Construction is a bottom-up process that is not guided by context. Thus, at the end of the construction process, the model has multiple possible meanings for the input text.
The integration process, the second phase, selects an interpretation of the input sentence consistent with the current context and the reader's goals. The integration process is connectionist in nature and uses a spreading activation mechanism. The most highly activated nodes in the network represent the reader's interpretation.
Mannes and Kintsch [15] extended the construction-integration theory to action planning. Their task domain was human-computer interaction. Their action-planning model took as input a representation of users' or planners' goals, a propositional representation of the text containing the task description, and a very schematic representation of the task context. Their model generated the commands required to perform the task described in the text. Mannes and Kintsch argued that text comprehension and action planning can be conceived as similar tasks. Readers and planners must integrate their goals and information from other diverse sources to select one out of many alternative interpretations of a text or one out of many competing plans for action.
The model simulates Hutchins et al.'s [7] evaluation stage (shown in Figure
1) by elaborating the display representation with knowledge retrieved from long-term memory. The
retrieval cues are the task and device goals and the propositions representing the current display. The
probability that a cue retrieves a particular proposition representing a piece of knowledge in long-term
memory is proportional to the strength of the link between them. The propositions in long-term memory
represent knowledge about the screen objects. For example, if Object23 is the scrolling list item labeled
Serial Position, then the following knowledge items are stored in long-term memory about Object23:
This process is dominated by two factors. First, strong links from the goals to propositions in the network that share arguments with the goals, and second, the number of propositions necessary to link goals to candidate objects. As a result, the action planning model selects candidate objects closely related to the task and device goals. Device goals can directly specify a screen object, and thus can be directly linked to the screen object represented in the network. Task goals can be linked to screen objects through labels. Thus, the task goal shown in (2) is linked to the object representing the variable Serial Position in the X-axis scrolling list by the overlap of the labels that are part of the display representation.
The second construction-integration cycle selects an action to be performed on one of the three candidate objects. During the construction phase of this second cycle, the model generates a network with representations of all possible actions on each candidate object. Examples would include single-clicking and moving the screen object labeled Serial Position in the X-axis scrolling list. At the end of the second integration phase, the action planning model selects the most highly activated object-action pair as the next action to be executed. The process is dominated by the same two factors described above. However, the relevant interaction knowledge must be retrieved during the evaluation stage. For example, the action planning model must retrieve the fact that objects in the scrolling list can be selected.
The Kitajima [10] simulations focused on the two actions related to the task and device goals given in (2) and (3). The correct action sequence involved moving the mouse cursor to point at the label Serial Position in the X-axis scrolling list, followed by single-clicking on that object. The simulation experiments started with a display representation defined by Figure 2, and then the model attempted to perform the sequence of correct actions.
Kitajima [10] found that the action planning model can reliably generate the correct action sequence with no device goal. However, the task goal had to be stated exactly as given in (2). A perfectly reasonable task goal like "Plot Observed as a function of Serial Position" does not work. Kitajima [10] concluded from his simulations that the task goal had to be directly linked to the labels for the X-axis scrolling list and for the correct object in that scrolling list.
Furthermore, Kitajima [10] found an effect of number of competing screen objects. If the model was required to make the correct actions with a screen representation that included both the dialog box as well as the data table in the background with the distracting label Serial Position, these additional distracting objects prevented the action planning model from successfully generating the correct action sequence. However, limiting the focus of attention to the nine screen objects defined by the dialog box shown in Figure 2 enabled the specific task goal shown in (2) to generate the correct action sequence.
The action planning model always performed the correct actions given the device goal (3). The direct link between the device goal and the correct screen object caused the model to include the correct screen object in the list of the three candidate screen objects during the first phase of the execution stage. During the second phase, when selecting the correct action, information retrieved from long-term memory enabled the model to decide that the only possible action was to single-click on this object. The action planning model performed the task perfectly because the correct action was the only possible action.
Kitajima's [10] results show that we can extend Kitajima and Polson's [11] original action planning model to account for people with a lot of background experience learning a novel application by describing how they formulate very specific task and device goals. Our goal-formation model assumes that this process is analogous to solving word-problems. The text comprehension processes take a semantic representation of the next task as input and combine this representation with highly specialized background knowledge to generate the required task goals. Device goals are acquired by interacting with the interface.
The original problem statement "Plot Observed as a function of Serial Position" is transformed by two task specific problem schemata associated with the task Plot:
(4) Put variable-label1 on X-axis
(5) Put variable-label2 on Y-axis
In addition, specialized comprehension knowledge incorporated into the schema is required to fill the slots (i.e., "as a function of" means that the variable label before the phrase is put in the Y-axis slot, and the variable after the phrase is put in the X-axis slot). In the following section, we propose problem schemata that map instructions into task and device goals.
______________________________________
TASK schema
task-action: put
task-object: Serial Position
task-specification: on X-axis
_______________________________________
The resulting task goal description is represented by two propositions: (perform put Serial_Position) and
(location-of Serial_Position on_X-axis). Kitajima and Polson's [11] action planning processes then
generates the sequence of actions that achieve this task goal.
The transformations performed by the TASK schema can be complex. The original task instructions can contain information irrelevant to the task goal, and thus the TASK schema must summarize the instructions to generate a task goal. The transformation shown in the above example is a simple paraphrase into a form that links directly to the labels of screen objects defined by the interface. The TASK schema can also generate necessary elaborations of terse instructions.
The DO-IT schema maps instructions that describe a single legal action for the interface on a screen object with various attributes into a description of the form "perform device-action on device- object with additional device-specification." For example, the instruction, "Click on Serial Position in the X-axis scrolling list," is transformed into the following instance of the DO-IT schema:
________________________________________
DO-IT schema
device-action: single-click
device-object: $ ; variable undefined
ID: scrolling-list-item
attribute
label: Serial Position
location: X-axis scrolling list
________________________________________
The resulting task goal is the following set of propositions: (perform single-click $), (isa $ scrolling-list-
item), (has-label $ Serial_Position), (location-of $ X-axis_scrolling_list). Observe that this task goal
specifies a single action. It's arguments link to the screen representation and action representation. The
action planning processes can generate the specified step.
The DEVICE Schema transforms experiences interacting with the interface into device goals. The schema generates one or more propositions of the form "realize device-object is-in-device-state with additional device-specification." The result is a device goal like (3). For example, the experience of successfully highlighting Serial Position in the X-axis scrolling list generates the following instance of the DEVICE schema:
______________________________________________
DEVICE schema
device-object: Object23
ID: list-item
attribute
label: Serial Position
location: X-axis scrolling list
display-state: highlighted
associated-task-goal: perform "put Serial
Position on X-axis"
______________________________________________
Experienced users employ the current task goal and / or display as a cue to retrieve device goals from long-
term memory.
In this section, we evaluate the LICAI model using data from Franzke [3]. Experienced Macintosh users were given the task of creating a new graph with a novel graphing application, Cricket Graph I** or III***, or one of two forms of the EXCEL 3.0**** interface. The graphing task was divided into two subtasks. The first was to create a default line graph by opening a document containing the data to be plotted, selecting the correct graph style (e.g., line graph) from a menu, and assigning the designated variables to the X- and Y-axis. The second subtask was to edit the default line graph. The edits were done in a specific order. The descriptions of the edits were very terse. Participants learned to do subtasks by exploration. If they had not made any progress toward the next correct action for more than 2 minutes on a particular step, they were given brief hints like "select line graph from the graph menu," or "double-click on legend text."
The goal formation processes of the LICAI model have been simulated using Franzke's [3] experimental paradigm [12]. The LICAI model only makes course grain predictions about the behavior of Franzke's participants. The instructions and the schemata assumed by the model may or may not enable participants to generate the correct task goal. If they generate the correct task goal, the action planning processes will generate the correct action sequence. However, the LICAI model does not describe the search behavior that occurs if the task goal construction process fails. We account for the initial success or failure of the goal formulation process. If instructions for a given exercise contain the necessary information, the comprehension processes will generate the goals that enable the action planning processes to generate the correct action sequence for the exercise. Thus, the LICAI model partitions the exercises given to Franzke's participants to tasks that can be done with little or no trial-and-error search and those that the model cannot perform because it can't generate the necessary task goal from the instructions. However, the model is able to generate a qualitative account of Franzke's results.
Label following is consistent with the LICAI model. If instructions contained a description of either an action or an object that matched a screen object label, those labels were preserved when the propositional representation of the instructions was mapped into a task goal. The links between the task goal and the correct screen object can mediate performance of the correct action if participants have the necessary knowledge about the screen object.
Because all participants were experienced Word users, we can assume that they had the knowledge necessary to elaborate this cryptic instruction by assigning appropriate attributes to Geneva, 9, bold, and knowledge to transform them using the TASK schema into a series of subtask goals for font, size, and style. The following is the TASK schema instance for font:
_____________________________________
TASK schema
task-action: change
task-object: legend-text
task-specification
attribute: font
target: Geneva
_____________________________________
Observe that the task instructions do not give any support for finding access to the action double-click.
Franzke [3] found that almost all her participants had to be given a hint to double-click on legend-text. There was no evidence that they had any difficulty forming the correct task goal, change the legend-text. The screen object representing legend-text could easily be identified if participants had general knowledge about graphs. The task goal overlapped with the label for that screen object, so the action planning model would include the correct screen object as one of the three candidate screen objects for action. Participants did not know that the legend-text could be double-clicked, or that to edit the legend-text, it must be double- clicked. Thus, there was no link between the action specified by the task goal, change, and the action required by the device to complete this task, double-click. However, the action planning model will never generate the correct action without these links.
Once the dialog box was open, participants had no trouble completing the task. This result is consistent with the LICAI model's behavior. The model can perform each subtask specified by the instruction because the subtask goals link directly to a scrolling list title and to the relevant item in the scrolling list.
There were large practice effects. Mean task completion times dropped from about 15 min. for the first attempt on the task to about 7 min. on the second attempt. There was a small effect of delay of about 1.5 min. Most improvements resulting from practice were found on tasks where terms used in the instructions did not mach labels on the interface (see ref. [3], figures 5, 6, 7, and 8).
Consider the subtask of moving the legend. A significant number of participants had some difficulty with this task during the first session. When this difficulty occurred, the experimenter gave a hint, "Grab the legend which is to the right of the plot symbol, open circle, labeled as Observed." The hint would be comprehended by instantiating DO-IT schema, and the results of performing the hint would be encoded by the following instance of the DEVICE schema:
_______________________________________________
DEVICE schema
device-object: Object56
ID: legend-text
attribute
label: Observed
display state: grabbed
associated-task-goal: perform "move legend"
_______________________________________________
Object56 represents the legend-text on the graph. Participants would also acquire knowledge that would
enable them to correctly recognize new objects as a legend. During the second session, respondents were
asked to do the same task 'move the legend,' but with different graph and data. The TASK schema would
generate the identical task goal, which would serve as a retrieval cue for device goal. Participants would
have acquired knowledge to recognize Object67 as legend and replace Object56 with Object67.
In summary, the large improvements in performance that Franzke [3] observed resulted from improvements on subtasks where the TASK or DO-IT schemata could not generate an effective task goal. One result of successfully performing the interaction is to gain information required by the DEVICE schema to generate the correct device goal. Participants were likely to retrieve these device goals on the second attempt at the task. Kitajima [10] showed that the action planning model always performed correctly when given the correct device goal.
We developed and evaluated the LICAI model of display-based human-computer interaction that has goal formation processes and action planning processes, both based on the construction-integration model. The goal formation processes transform initial task descriptions into the precise goals that enable the action planning processes to generate the correct actions. These processes are specialized comprehension strategies that employ task and interface specific schemata to construct goals. The action planning processes use representations strongly constrained by the superficial details of the interface (i.e., labels, menus, and buttons) and the interaction conventions of the host operating system. The goal formation processes must transform task descriptions into goals that link directly to the action planning representations. Most of the power and flexibility of this LICAI model is in the goal formation component.
The development of the Minimalist Instruction paradigm was stimulated by the then-surprising result that detailed and carefully designed training and reference materials for early versions of word-processors were unusable (e.g., [14]). This result is not surprising in light of research on word problems and the LICAI model. Mack et al.'s [14] participants did not have the necessary schemata or action planning knowledge, and attempts to include explicitly all necessary background information lead to long and confusing documentation for these new users.
Our LICAI model suggests that strong constraints on the content of instructional materials exist. Materials generated by minimalist design heuristics are constrained by the interface to an application and users' background knowledge. Although minimalist instructional materials are designed to support learning by exploration, the interface also must facilitate learning by exploration. Otherwise, instructional materials must provide a step-by-step description of how to perform every task. Carroll [1] and his collaborators have shown that most users are unwilling to read such detailed instructions, and users have a great deal of difficulty even if they try to use the step-by-step instructions.
Our LICAI model can be used to develop explicit design guidelines for content. The model's focus on task goals is consistent with the minimalist paradigm. The model makes very clear the kinds of constraints that must be understood in following Carroll's [1] design heuristic of minimizing the amount of written material. We showed that comprehending the very terse instructions used by Franzke [3] required specialized background knowledge about task and interface. "Change the legend text to: Geneva, 9, bold" cannot be understood by someone who has no experience with a modern word-processor. Effectively minimizing the amount of written material requires careful attention to the action and display knowledge and schemata assumed in the user population. A minimalist version of a complete manual for a modern word-processor would have to assume that users have the TASK and DO-IT schemata described in this paper.
The analysis presented in this article strongly reinforces the importance of the label following strategy. In addition, it shows that even when a correct task goal is generated by instantiating TASK and/or DO-IT schemata, a significant amount of background knowledge is needed to select correct actions. These results provide support for the Cognitive Walkthrough [19] methodology, which evaluates the effectiveness of the label following strategy and characterizes the background knowledge necessary to infer correct actions, serving as a design evaluation technique for application interfaces that support learning by exploration.
The authors gratefully acknowledge research support from the National Science Foundation Grant IRI 91- 16640. We thank Walter Kintsch and Clayton Lewis for ideas, direction, and comments on earlier versions of this paper.
* LICAI is acronym for the LInked model of Comprehension-based
Action planning and Instruction taking. When LICAI is pronounced like "Lee CHI," the
pronunciation represents the two-kanji character Japanese word,
, which means comprehension.
[Return to text]
** CA Cricket Graph, version 1.3.2, 1989.
*** CA Cricket Graph III, version 1.01, 1992.
**** MS EXCEL, version 3.0, 1990.
[Return to text]