IEEE Aero99
IEEE Aero99
IEEE Aero99
net/publication/3816498
CITATIONS READS
87 3,104
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by David E. Smith on 10 June 2014.
2
Richard Washington1, Keith Golden, John Bresina, David E. Smith, Corin Anderson , Trey Smith3,
NASA Ames Research Center, MS 269-2
Moffett Field, CA 94035
650-604-5000
{richw | kgolden | bresina | de2smith | corin | trey}@ptolemy.arc.nasa.gov
ABSTRACT—The Pathfinder mission demonstrated the find water and other resources, set up power plants, and
potential for robotic Mars exploration but at the same time mine and transport resources.
indicated the need for more robust rover autonomy. Future
planned missions call for long traverses over unknown These capabilities will require rovers with robust
terrain, robust navigation and instrument placement, and autonomous operations so that they can perform
reliable operations for extended periods of time. Ultimately, independently over long intervals while achieving mission
missions may visit multiple science sites in a single day and and science goals. The rovers will communicate infrequently
perform opportunistic science data collection, as well as with earth, and the communication will have high latency
complex scouting, construction, and maintenance tasks in and low bandwidth. Rovers that are continually dependent
preparation for an eventual human presence. Significant on commands from Earth would incur huge cost and would
advances in robust autonomous operations are needed to achieve a much lower science return due to the lost time that
enable these types of missions. the rover is waiting for instructions. Given that rovers have a
limited lifetime, such wasted opportunity translates to much
Towards this end, we have designed an on-board executive lower return on investment.
architecture that incorporates robust flexible operation,
resource utilization, and failure recovery. In addition, we Robust autonomous operations will also be important in
have designed ground tools to produce and refine contingent support of human presence on Mars. Even though latency
schedules that take advantage of the on-board architecture’s will be less of a problem, the crew will be busy with more
flexible execution characteristics. Together, the on-board important tasks and will not have the time to continuously
executive and the ground tools constitute an integrated rover monitor the rovers and control their every move. Thus,
autonomy architecture. autonomous rovers can greatly amplify the productivity of
the small crew. It is also advantageous for the rovers to be
responsible for their own well-being: to avoid getting
damaged, as well as to diagnose and correct recoverable
TABLE OF CONTENTS software and hardware failures. With robust autonomous
rovers, the crew does not have to spend time baby-sitting
1. INTRODUCTION rovers and can concentrate on survival and science.
2. CURRENT AND FUTURE ROVER AUTONOMY
3. ARCHITECTURE OVERVIEW Although much research has been done to endow
4. FIELD TEST autonomous robots with far greater capabilities than were
5. FUTURE WORK demonstrated on Sojourner, little of that can be used directly
6. CONCLUSIONS for planetary missions. On space missions, safety is
paramount. Missions are very expensive, and even the
simplest mistakes can cause the mission to fail. If the rover
1. INTRODUCTION became permanently stuck on a rock or damaged, the effect
on the mission would be disastrous. Thus, on-board
Mars exploration is a high priority at NASA. Current plans technology will always be more conservative than what has
call for a mission to Mars every 26 months. Future missions been tried in a laboratory setting.
will require rovers with more capabilities than have ever
been demonstrated on previous missions. The rover to be In the following section, we discuss the capabilities of
launched in 2003 will be expected to survive for more than a current rovers and the additional capabilities needed to
year and traverse more than 10 kilometers. By contrast, the support planned missions. In Section 3, we discuss a rover
Sojourner rover in the 1997 Mars Pathfinder mission was autonomy architecture that provides these needed
only expected to survive for a few weeks and stayed within capabilities. In Section 4, we discuss plans for an upcoming
sight of the lander. Ultimately, in support of human field test that will demonstrate some of these capabilities.
exploration, rovers will be needed to scout out landing sites, We then discuss future directions of this work.
§
U.S. Government work not protected by U.S. copyright
1
Caelum Research Corporation at NASA Ames Research Center
2
University of Washington
3
Carnegie Mellon University
2. CURRENT AND FUTURE ROVER AUTONOMY communication events with no human intervention and
no assistance from ground-based programs.
Criteria for robust autonomous rovers
• Need for understandability. Mission designers must
For a rover to demonstrate robust autonomous operations, it be convinced that the proposed approach will improve
will need to adapt to its environment. Although someday this the quality of the mission. If the approach is complex
may be performed through automatic, on-board modification and unclear to the mission designer, it is unlikely to be
of plans and models, this could produce unanticipated used.
behaviors and thus would introduce an added risk into the
mission. Instead, we focus on those aspects that can allow • Multiple objectives. A deployed planetary rover must
the rover controllers on ground to specify the rover’s balance considerations such as navigation with science,
response to a range of possible operating situations that may communication, resource consumption, and fault
arise. In particular, we consider the following criteria recovery. A rover must be able to operate as an
essential for robust rover autonomy: integrated system.
The ability of future rovers to achieve the more ambitious Failure recovery—Current rovers have very limited
goals of future missions will depend critically on robust capabilities for recovering from faults or anomalous
autonomous operation. Each of the criteria for robust situations. The rover’s response to most execution failures is
autonomy can be seen to be lacking to some extent with to halt all activity and wait for the ground operations team to
current mission-ready rovers: determine what went wrong and uplink a recovery plan.
Depending on the nature of the anomalous situation and the
Robust flexible execution—Currently, a rover command quality of the downlinked information for the purposes of
sequence is specified to the lowest level of detail and leaves diagnosis, this ground-based recovery process can cost days
few, if any, choices to be made at execution time; hence, it of rover idle time and lost science opportunity. For example,
admits only one, or a very small number of, valid execution a rover traverse may fail with a high wheel current combined
behaviors. This simplifies the execution process but does not with the wheel encoder showing no movement. In this case,
allow execution to be responsive to the dynamic status of the the wheel may be stuck on a rock or the encoder may be
rover and the environment. This inflexibility can cause broken. The ground team will need to uplink diagnostic
reduced productivity and execution failures. sequences to determine which one is the case, wait for the
results to be downlinked, and eventually uplink recovery
For example, in Sol 22 of the mission, Sojourner received sequences to remove the rover from the rock if that is the
the following challenging instruction sequence: problem. Valuable science opportunities are lost during this
1. Back up to the rock named Soufflé. time. Even transient problems such as an overheated motor
2. Place the arm with the spectrometer on the rock. can cause plan failure, where perhaps a brief pause to cool
3. Do extensive measurements on the rock surface. down would be sufficient to remedy the problem.
4. Perform a long traverse to another rock.
At the next communication opportunity, the news came in In the 1996 Rocky 7 field tests, rocks got caught in the
from Mars. The good news was that the sequence was wheels, causing wheels to jam and drag on the ground [2,3].
executed to completion, including the longest traverse ever This was diagnosed in the desert by human observers
done in one day, a world’s record for Mars. However, there noticing the track that the stuck wheel left in the soil, and it
was bad news with the good. The rover stopped short of the was fixed by prying out the rock with tools. On a more
rock and the spectrometer was left hanging out in mid-air remote site, an automatic fault diagnosis and recovery
rather than placed on the rock. The rover sensors indicated system would allow the rover to continue, or at least to stop
that the spectrometer was not in contact with the rock, but before damaging itself and to request help from Earth.
the rover continued with its spectral measurements. Not only
was the spectrometer data useless, but the rover spent over Research on outdoor robots
six hours running the spectrometer, using time and energy
that could have been used otherwise. The rock was never A number of rovers have been built for Earth-based missions
again visited and thus that science opportunity was lost. A or to test research ideas on realistic platforms. Much of the
more desirable behavior would be for the rover either to stop work on outdoor robotics has focused on advances in the
and take pictures to aid the ground team’s diagnosis of the robotic hardware necessary to operate in the unstructured,
difficult environment that an outdoor setting presents. For guarantee on behavior, and their operation is not easily
example, the main contribution of the Ambler robot is a predictable. Bottom-up approaches that rely heavily on
novel form of leg-based locomotion [4]. This work is programming and interacting behaviors [18,19] become
complementary to the work in this paper, which concentrates equally unpredictable when the system becomes large.
on the software architecture needed for robust autonomous
operations. These architectures assume tightly coupled interaction
between different elements of the architecture (integrated
Many outdoor robots are designed to be largely teleoperated planning and execution) or between a human and the rover
[5,6,7], concentrating on the ability to navigate system (teleoperation). Since it is not currently feasible to
autonomously to a given waypoint, avoiding obstacles along put a full planner on board a planetary rover, either option
the way. The performance of these robots is impressive in fails the requirement of limited communication.
terms of distance traveled [6] and environmental hazards
overcome [7], but the problem was restricted so that the Research rovers, both indoors and out, are tested repeatedly
robots did not have to exhibit robust behavior with respect under similar environments and situations to demonstrate the
to science goal achievement. usefulness of the rover architectures under those conditions.
Based on the test results, models are refined, parameters
The ADS autonomous land vehicle [8] generated contingent adjusted, and the rover control software modified. However,
plans over uncertain terrain. The work was restricted to path the only environment that truly matches that of a planetary
planning, and a prior, possibly inaccurate map of the mission is on the planet. Once there, the rover has a limited
environment was assumed to exist for generating paths. lifetime, which must be devoted to mission goals, leaving no
time for further testing and refinement.
Research on indoor robots
Rover research technology often addresses one problem,
In contrast to the deployed missions and even outdoor such as navigation, in isolation. A deployed rover must,
robots, results from research labs on indoor robots would however, balance considerations such as navigation with
suggest that mobile rovers can operate reliably and science, communication, resource consumption, and fault
autonomously over a wide range of conditions and a wide recovery. An integrated system presents challenges that
range of time, demonstrating impressive abilities for single technologies do not address. Nonetheless, the
navigation, fault recovery, replanning, and science research literature contains many results that are of potential
acquisition. However, the state of the art in outdoor rover value to missions. Some have influenced current rover
technology is significantly more modest. The reasons lie in designs, but many more remain unused. Individual
the considerations for mission-ready rovers stated above, technologies, in addition to those mentioned above, include
along with the difficulty in assembling an integrated rover verifiable real-time control [20,21,22] and human-computer
architecture from a number of disparate, individual research interaction [23].
results.
The next generation of mission-ready rovers
Mission-ready rovers must operate in environments that are
largely unknown and unstructured. Many indoor robots rely The next generation of rovers, the first of which is expected
on an accurate map of the environment. Furthermore, the to launch in 2003, will be more flexible than Sojourner.
environment is often assumed to have straight walls, flat Instead of simple command sequences, the rovers will
floors, and right angles at corners. These assumptions are execute complex contingency plans, which tell the rover
obviously not true for outdoor environments, and the explicitly what to do if something goes wrong. They will
approaches are not easily extended to the general case. also execute plans more robustly, so minor problems such as
incorrect resource estimates or motor overheating do not
Map learning has been proposed as a method for cause complete failure. Finally, they will be able to identify
overcoming the lack of an accurate map [9]. However, this
and diagnose internal faults and recover from simple
is difficult in unstructured environments with significant
failures.
motion error. Moreover, the usefulness of a map-learning
phase is dubious in an environment where exploration and To imagine how these rovers will behave, consider again the
science tasks are foremost; the extra time necessary to build problem of backing up to a rock, deploying a sensor,
accurate navigation maps will have a severe impact on the gathering data, and moving on. We can imagine what a
overall science return. smart rover would need to do in this case. First of all, the
operation of backing up on Sojourner was brittle because it
used a simple “try three times” strategy. We can imagine a
Many architectures have been proposed for autonomous
more robust operation of backing up until contact, with
robots, relying largely on artificial intelligence techniques
some timeout in case it hits an insurmountable difficulty.
for merging high-level and reactive control of the rover
The backing-up operation should include the ability to try
[10,11,12,13,14]. Very few architectures address the issues
different approach paths if obstacles block the planned
arising in remote, planetary rovers [15,16,17]. In general,
route. The rover should also be able to take a moment to let
the architectures fail in the areas of understandability or
an overheating motor cool down rather than abandoning the
safety. The more complex AI techniques that promise higher
operation. In addition, the rover should notice that a wheel
levels of autonomous operations often come with no
seems to be malfunctioning and pulling the rover off-path,
and it should autonomously shift control algorithms to Scientist
Scientists
compensate. Certainly the rover should verify correct interfaces
instrument placement before doing hours of measurements;
sc
S c o als
furthermore, the rover should have alternative plans in case Rover
he
g
ie n
operator
du
it cannot make contact with the rock despite its best efforts.
ce
le
While it is performing measurements, the rover should
monitor its energy level to make sure that it will have Planner/Scheduler
enough energy left to send its data to Earth during the next Rover Op
communication event. It should cut short the measurements interface
Simulator
if it ends up in the shade of a larger rock and cannot charge Ground
its battery enough to complete the task and also
communicate.
Rover
Up que
se
The technology needed to construct such a rover is within
lin nce
ke
reach using artificial intelligence technology in development
ests
d
today. requ Resource
manager
Executive
cts
confli
3. ARCHITECTURE OVERVIEW
ands
We now discuss our rover autonomy architecture, which is
comm
Mode
specifically designed to accommodate the particular identification
o n
constraints of planetary rovers while supporting autonomous te ati
sta form
operations. The architecture builds on elements of the in
Remote Agent architecture [24], extending and modifying
Rover real-time
existing elements and adding new elements as needed. system
The Remote Agent has been applied to the problem of Figure 1: Rover autonomy architecture
spacecraft control, in particular a technology experiment on
the Deep Space One mission [25]. The differences between
spacecraft in the relatively predictable environment of space
Contingency Planner/Scheduler
and a rover interacting with an unknown planetary surface
have led to the particular elements of the current Throughout a mission, detailed mission operations plans
architecture. must be constructed, validated, and uplinked to a spacecraft
or rover. Currently a mission operations plan takes the form
The rover autonomy architecture consists of four major of a rigid, time-stamped sequence of low-level commands.
reasoning components (see Figure 1): a contingency Unfortunately, there is uncertainty about many aspects of
planner/scheduler (CPS), a conditional executive (CX), a task execution: exactly how long operations will take, how
resource manager (RM), and a model-based mode much power will be consumed, and how much data storage
identification and reconfiguration system (MIR). In the will be needed. Furthermore, there is uncertainty about
current system, CPS is a ground-based planner, which is environmental factors that influence such things as rate of
given high-level science goals and generates a temporally battery charging or which scientific tasks are possible. In
flexible schedule along with contingency plans for possible order to allow for this uncertainty, current plans are based
execution failures. The planning capabilities will on worst-case estimates and contain fail-safe checks. If tasks
incrementally migrate on board as advances in time- take less time than expected, the spacecraft or rover just
constrained planning and mission-qualified processor power waits for the next time-stamped task. If tasks take longer
allow and as the need for autonomy increases. The than expected, they may be terminated before completion. In
contingent schedule is refined with help from a rover fact, all non-essential operations may be halted until a new
operator and then sent to the on-board executive CX. These command sequence is received. All of these situations result
commands are sent to the real-time control system, with in unnecessary delays and lost science opportunities.
results coming back via state monitors into MIR. MIR’s
mode identification layer infers the system state from the To account for execution uncertainty, CPS can actively plan
monitored information and updates the state for CX. If for, and take advantage of, possible contingencies. Thus, if
commands fail or schedule constraints are violated, CX tries an operation takes longer than a certain amount of time, or
to recover using the contingency plans. In the future, CX the power remaining drops below a specified value, a
will also be able to call MIR’s mode reconfiguration layer to different pre-planned sequence of operations can be
produce a recovery plan for unanticipated failures. performed. Building contingency plans is, in general,
intractable, and so contingency planners tend to be slow
[26,27,28]. To overcome this problem, CPS employs the
Just-in-Case (JIC) planning approach [29], originally
T1 T2 T3 T4 The resulting schedule is then integrated into the existing
contingent schedule and the iteration is complete.
(a) The inital schedule
"T2 used too much time"
T1 T2
T1 T2 T3 T4 T5
Scheduler
T6 T7
T1 T5 T2 T6 T7 T8 T9 T10
T1 T5 T2 T3 T4
Figure 3. A plan with a nominal sequence (top) and a set of
contingent branches.
T6 T7
The initial JIC work dealt only with one resource: time. For
(c) The new schedule rover operations, uncertainty about other resources can also
Figure 2: The scheduling process. Initially, the scheduler lead to potential failures in a plan. For example, if a task
builds a straight-line schedule assuming expected outcomes uses more power than expected, or the battery has not
(a). The scheduler then finds a likely breakpoint; in the charged as much as expected, there may be insufficient
example, the task T2 is found to often take too long, power available for the subsequent tasks. However, there
preventing T3 from executing. The schedule up to the may be enough power left to do other useful tasks. Thus, we
offending task, plus the failure condition, is fed back into have extended JIC to also consider power consumption and
the scheduler (b). The output is a new straight-line schedule data production when it is searching for a probable break
that is the contingent alternative. The original and new point. Furthermore, we consider both when a resource is
overdrawn (a task takes too long) and when a resource is
schedules are then merged back into one (c), with an
available in surplus (a task required less power than
explicit test of the contingency added in after task T2.
expected). This ability allows the scheduler to build
contingent plans that take advantage of unexpected surplus.
developed to generate contingent observation schedules for Conditional Executive
automated telescopes.
The conditional executive (CX) is responsible for
The basic idea of JIC is to take an existing schedule and interpreting the command sequence coming from ground
look for the places where it is most likely to fail. The JIC control, checking run-time resource requirements and
scheduler then generates alternative schedules for each of availability, monitoring plan execution, and potentially
those situations. The JIC scheduler starts with a sequence of selecting alternative plan branches if the situation changes.
tasks, where each task must be performed in a certain
temporal window. However, there is uncertainty in how long The input to CX consists of the primary plan and a set of
a particular task may take, and this can lead to potential alternate plans. The primary plan contains a nominal
failures of the schedule. For example, an execution failure sequence and a set of contingent branches (see Figure 3).
could result if one task finished sufficiently late that the next The nominal sequence is the sequence that will be executed
task’s start window has already passed. if there are no deviations from the a priori expectations of
the environment and actions. The contingent branches
CPS operates at two levels. At the lower level, CPS builds specify alternative courses of action. Within any contingent
straight-line (non-contingent) schedules of the tasks that it is branch there may be further contingent branches, hence the
given. It does this by piecing together a schedule, one task at primary plan is a tree of alternative courses of action.
a time. CPS uses a local search strategy to determine which
task should be added to the schedule next, and where. The The alternate plans are not attached to particular points in
local search strategy has the advantage of being an anytime the primary plan but are rather applicable at any time their
algorithm—over time, CPS will produce schedules of conditions are satisfied; in some sense they are global
increasing quality. contingent branches, whereas the contingent branches within
a plan are local to their position in the plan. When enabled,
At the higher level, CPS actually builds the contingent each alternate plan can either replace the executing plan or
schedule using this lower level as a subroutine. An initial insert itself between actions of the executing plan. Enabling
straight-line schedule is built first, then contingent branches events can include unexpected opportunities, plan failures,
are iteratively added (see Figure 2). In each iteration, the or conditions such as resource shortfalls and component
point in the contingent schedule that is most likely to fail is degradation.
selected. Then, the low-level scheduler is called to build a
new straight-line schedule, given the breaking condition and CX starts by executing the nominal sequence of the primary
the schedule prefix up until the breakpoint as an initial seed. plan. At each point in time, CX may have multiple options,
corresponding to the eligible branches of a branch point and As the rover executes the procedure for its action to move
the enabled alternate sequences. CX chooses the option with back down the ridge, the mode identification component of
the highest estimated expected utility, computed over the MIR notices an inconsistency between the commanded
remainder of the plan. The utility of successfully completing speed of the left front wheel and the current draw of its
an atomic action is set by operators on the ground. From this motor, and reports the anomaly to CX. CX invokes the
atomic utility and a model of the probabilities of various procedure’s predefined exception handler for that anomaly,
events (such as a traverse taking longer than anticipated), the which in turn requests a recovery program from the mode
expected utility of an entire branching sequence can be reconfiguration component of MIR. The recovery program
estimated. This expected utility is initially computed by the resets the motor microcontroller and resolves the anomaly.
ground planner CPS; the utility could be updated by an on- The entire MIR cycle has happened within the scope of a
board plan revision component at run time to reflect changes single action of the plan, below the level of abstraction of
in resource availability, system state, or the environment. the off-board planning system CPS.
CX receives state information from the mode identification Returning to the bottom of the ridge, the rover continues
module MIR and resource information from the resource with its uplinked sequence by beginning its traverse to the
manager RM. It uses this information to check action east. An hour before sundown, the vision system picks up an
preconditions and maintenance conditions, as well as to unusual green patch on a nearby rock. The ground operators
check the preconditions of the alternate plan library. The had supplied an alternate plan for this serendipitous science
ability to branch on any state or resource condition provides opportunity and had given it a high priority. After
the sequence writer with a powerful language for describing considering the relative utility estimates for the options of
activities. continuing the current plan or inserting the alternate plan to
examine the rock before the next action, CX chooses to
CX extends its own robust activity mechanism using the examine the rock.
fault recovery mechanisms of MIR. CX may request
recovery commands from MIR and integrate those After sidetracking to the rock, CX returns to the primary
commands into the procedure for the currently executing plan just before sundown. The next two actions, “Travel
action. If MIR cannot suggest a recovery, the action has east” and “Downlink,” both make requests for time and
failed, and depending on instructions in the plan, CX either power from the resource manager RM. It turns out that
ignores the failure or aborts the executing plan and checks because of the sidetrack, the total energy requested by these
for applicable alternate plans. In the case that no alternate actions exceeds the battery energy stored during the day.
plans apply, CX aborts the entire plan set and puts the rover RM reports the conflict to CX. The ground operators had
into a stable standby mode. given a very high priority to the “Downlink” action and
specified that it should run even if previous actions failed, so
As an example of flexible execution, consider the following CX chooses to cancel the lower utility “Travel east” action.
plan for a day’s traverse: the rover is to the south of a small
ridge, trying to head generally north. The uplinked primary After the “Travel east” command has been cancelled, CX is
plan specifies the following course of action: ready to call the downlink action—but since the plan
• Travel north to the top of the ridge requires that the downlink begin at a specific time, CX waits
until the beginning of that window to execute the downlink.
• Choose between the options:
Ground operators can also request that CX wait for an
• Nominal option, highest utility (precondition: there arbitrary predicate to become true, such as the antenna being
must be a path) turned on, before executing an action.
• Continue to the north
• Downlink to ground at sundown Resource Management
• Contingent option, lower utility
Resources on rovers are severely limited and at the same
• Move back down the ridge time critical for mission success. Solar energy is the primary
• Travel east scanning for a pass source of power, but downlink events may be scheduled
• Downlink to ground at sundown when there is little or no sunlight, so the power must be
Because operators have a good view of the slope of the managed such that the on-board battery is sufficiently
ridge, they decide to precisely define the low-level charged to communicate. There are more opportunities to
navigation for that segment. They can upload a time- take pictures and instrument measurements than there is
stamped set of motion commands as a sequence. In this case, space on board or communication bandwidth, thus space
CX is operating like a traditional sequencer. needs to be managed so that the most important data are
stored and sent back.
At the top of the ridge, the rover’s on-board navigation
system judges that there is no safe path to the north. The Command sequences are sent at periodic intervals, usually
nominal plan option is not eligible to execute because its daily. Sequence writers (or automatic planning and
precondition has failed, so the contingent option is selected. scheduling tools) must make conservative estimates of the
resource usage to avoid oversubscribing the resource (for
example, draining the batteries). But overly conservative
estimates may not make full use of the resources, leaving the Interaction between resource management and sequence
rover idle or passing up science opportunities that were in execution—The resource manager RM in the rover
fact possible. Conversely, overly optimistic estimates may autonomy architecture communicates with the sequence
lead to fault conditions that will at best break the plan and at execution component CX. Each step in the nominal
worst damage the rover. sequence has an expected resource profile associated with it.
The ground-based planner/scheduler CPS uses the expected
The underlying problem is that resources cannot be profiles and the expected resource availability to construct
estimated precisely for an entire day since the rover’s the sequence. Under normal conditions, the initial sequence
interactions with the environment are complex. Only during
will respect the resource availability (although this is not
the execution of the sequence will it become clear how much
assumed, in case the resource availability changes before
of each resource is indeed available. If a traverse completes
quickly because of unexpectedly easy terrain or good uplink).
traction, more tasks may be possible with the surplus time
and power. Conversely, if the rover traverses a ridge, slanted CX sends the expected profiles to RM, which records them
away from the sun, the accumulated energy may be and checks for conflicts (see Figure 4). Any conflicts are
insufficient to run all the planned experiments and also signalled to CX. CX in turn can respond in a variety of
communicate with ground. The rover would then have to ways, depending on the severity and immediacy of the
discard some of the experiments to reserve energy for conflict. CX can fail the plan, select an alternate plan from
communication. its plan library, or ignore the conflict (for example, in the
case that it is a future conflict with a low priority task).
We have designed an on-board, run-time resource manager
that receives estimated resource profile information from As the sequence is executing, the estimates of resource
tasks, monitors current and planned resource usage, and usage and availability become concrete. Resource monitors
reacts to changes in resource availability. The resource gather the information about the real usage and availability
manager is largely transparent to sequence writers, while and send that to RM. Based on this new information,
allowing them to take full advantage of the resources conflicts or opportunities may arise, which RM will in turn
available. The following are the primary contributions of the signal to CX.
resource manager:
Conflict detection and recovery—RM stores resource
• On-board resource conflict and opportunity profiles using a timeline-based representation. Each timeline
detection. The resource manager is able to notice when is a set of non-intersecting time intervals over which a
there are differences between predicted and available constant amount of the resource is used (see Figure 5).
resources, both in the present and the future. A
conditional sequencer that branches on resource The predicted resource availability, given by system
conditions can take advantage of changes and run the resource information and models of resource availability
plan that best conforms to the known resource
information.
Note that detection, recovery, and opportunity exploitation over time (for example, solar flux models), defines a
can operate on future resource requirements and predicted timeline of the maximum available resource during any time
availability. This will provide support for more flexible task interval.
scheduling and, in the long term, for further on-board
autonomy such as automatic scheduling, sequence Each resource request includes a profile of predicted
generation, and fault recovery. resource usage. The sum of all the granted resource requests
itself defines a timeline, and this resource request timeline
instance. In the Pathfinder mission, the ground
operations team turned off specific accelerometers
during particular maneuvers, and changed variables that
influence the rover’s reaction to minor faults. While this
is one solution to the context dependence of sensor data,
it is a fairly limited solution.
A conflict will arise whenever the request timeline exceeds • Simply halting and calling home for help is
the available timeline. This may occur because of unnecessarily conservative for many common faults.
availability changes or because of a new request. In either Often, the rover could recover from the failure and keep
case, the set of tasks using that resource during the time going or else perform actions that do not depend on (or
interval where the conflict occurs forms the conflict set. This affect) the damaged component. Doing so depends on
conflict set is then minimized to find a set of lowest-priority knowing the nature of the fault.
tasks that, when removed, will resolve the conflict. In the
case of a new request, the tasks must be lower priority than To address these problems, the rover autonomy architecture
the new task; if the conflict cannot otherwise be resolved, makes use of diagnosis, using all available sensor data to
the request is refused, thus resolving the conflict. If a infer whether the rover is behaving correctly and, in some
minimal conflict set is found that will resolve the conflict, cases, to infer the specific fault. When it is possible to infer
RM sends that set to the executive CX, which will react to the fault, and that fault is recoverable, the rover can execute
the conflict. For example, CX might remove those tasks the appropriate recovery plan; otherwise, it can always shut
from its planned sequence, potentially even aborting the down and call home for help. Diagnosis, taking into account
sequence if those tasks were necessary for sequence success. the complete state of the rover, allows faults to be detected
earlier and also reduces the number of false alarms.
Resource borrowing—RM has the ability to borrow
resources from tasks to satisfy new resource requests. For We refer to the diagnosis and recovery component of the
tasks that can operate in a number of modes, or background rover autonomy architecture as MIR, for Mode
tasks that can be preempted, the resource profile includes an Recovery actions
indication of how much of the resource requested could be 5
e.g. Stop and back up
given up without aborting the task. For example, an image
could be compressed with quality loss to free up data storage Model-based
for higher priority images, but without completely reactive planner
p
NOMINAL
S to
reactive planner (see Figure 6), called Burton [31], which
uses a universal plan compiled from the models to quickly D rive
determine the next action to execute, based on the current
Pow er
STATIONARY
off
state and the recovery request. The time required to generate
op
a complete recovery plan is linear in the length of the plan. St
The ability to decompose actions using a rich procedural In support of this active testing, MIR can make use of its
representation is a key point of the Remote Agent models, both to determine when there are multiple
architecture and will help in the rover autonomy architecture competing diagnosis and to identify activities it can perform
to provide robust implementations of sequenced actions. that will rule out or confirm certain hypotheses. Reasoning
about the information to be gained by executing actions
exceeds the ability of the MIR system designed for the
Remote Agent, but we are working to provide that In the future, we would like to see rovers that are capable of
capability. even higher levels of autonomous operations. These rovers
will accept very high-level goals from human operators and
Work in active testing for diagnosis [35] is typically based will be able to achieve those goals with no further
on probe selection for circuit diagnosis, and it relies on supervision, even in dynamic and uncertain environments.
certain simplifying assumptions that are valid for circuits but These rovers will be self-diagnosing and self-repairing; they
not for rovers. Some of the key assumptions are: will be capable of detecting gradual degradation, adjusting
1. Measurements do not affect the state of the system internal parameters accordingly, and performing preventive
being diagnosed. maintenance to avoid catastrophic failure. For example,
2. All measurements have equal cost. solar panels accumulate dust over time and are gradually
3. The goal of making measurements is to eliminate damaged by UV; the ability of the rover to execute a plan
ambiguity as quickly as possible (i.e., to minimize the will be dependent on its gradually decreasing energy
total number of measurements); the order of production, and it may need to perform actions to remove
measurements is otherwise irrelevant. dust periodically when operating over long time intervals. In
These assumptions lead to a minimum entropy measure for addition to expected and predictable degradation, rovers will
probe selection. The next probe selected is the one that be able to automatically replan when unexpected problems
results in the lowest expected entropy of the probability occur or serendipitous opportunities arise.
distribution of diagnoses. This policy tends to minimize the
total cost of measurements, under the assumptions listed We are building toward this vision in our research on robust
above. However, these assumptions do not hold in the rover autonomous rovers. While we are not there yet, it is
domain, for the following reasons, so minimizing entropy is reasonable to expect such capabilities in future generations
not sufficient. of rovers.
1. Any information that can be obtained without changing
the state of the rover, as long as it is not too expensive
to compute, is already continuously available to MI. 7. ACKNOWLEDGEMENTS
Any additional tests involve causal action, such as
spinning a wheel or taking a picture from a camera. We would like to thank the following people for their
2. On a rover, some sensing actions may have very high contributions:
cost, including the possibility of causing some
undesirable side effects, while others are relatively Michael Sims for giving us the opportunity to participate in
cheap. the Marsokhod field test and for his very helpful comments
3. In the rover autonomy architecture, the main purpose of on this paper. Hans Thomas and the rest of the Intelligent
diagnosis is to disambiguate the rover state enough to Mechanisms Group at NASA Ames Research Center for
find an appropriate recovery plan. Thus, not all helping us integrate our architecture with the Marsokhod
ambiguities are equal: the value of information depends user interface and control software.
on the value of the recovery it supports. In the case of
multiple faults, one fault may be more critical and need Katherine Smith for her ongoing work to implement MIR
immediate response, meaning measurements relevant to monitors and models. Barney Pell for his contributions to
that fault have priority. If several candidate faults have the initial resource management ideas. David Miller, the
the same recovery procedure, fully disambiguating the IMG, and the Exec team for their involvement in the
fault may even be unimportant. discussions defining the SCS language. Jim Kurien and
Chris Plaunt for their help in understanding and reusing
We are exploring a modification of the minimum-entropy existing Remote Agent code.
model, which ranks measurements according to the recovery
actions they support and penalizes measurements based on
the cost of the corresponding sensing actions. REFERENCES
[1] A. H. Mishkin, J. C. Morrison, T. T. Nguyen, H. W.
Stone, B. K. Cooper, and B. H. Wilcox, “Experiences with
6. CONCLUSIONS Operations and Autonomy of the Mars Pathfinder
Microrover,” Proceedings of the IEEE Aerospace
The particular characteristics of Mars rover operations Conference, 1998.
require a significant level of rover autonomy and an ability [2] S. Hayati and R. Arvidson, “Long range science rover
to handle resource constraints and unpredictable events. We (Rocky 7) Mojave Desert field tests,” i-SAIRAS, 1997.
have designed an integrated architecture for rovers that [3] R. Arvidson and S. Hayati. Mojave field experiments
includes contingency planning on ground and flexible, for Rocky 7 prototype Mars rover. URL:
robust execution of conditional sequences on board. The on-
http://wundow.wustl.edu/rocky7.
board executive draws on model-based fault diagnosis,
[4] J. Bares, M. Hebert, T. Kanade, E. Krotkov, T. Mitchell,
active sensing, and dynamic resource management to
maximize its science return. R. Simmons, and W. Whittaker. “Ambler: An autonomous
rover for planetary exploration.” IEEE Computer, June
1989. [20] J.-J. Borrelly, E. Coste-Manière, B. Espiau, K.
[5] D. Christian, D. Wettergreen, M. Bualat, K. Schwehr, D. Kapellos, R. Pissard-Gibollet, D. Simon, and N. Turro, “The
Tucker, and E. Zbinden. “Field experiments with the Ames ORCCAD Architecture.” The International Journal of
Marsokhod rover.” In Proceedings of the 1997 Field and Robotics Research, 17(4), pp. 338-359, April 1998.
Service Robotics Conference, December 1997. [21] S. A. Schneider, V. W. Chen, G. Pardo-Castellote, H.H.
[6] D. Wettergreen, C.Thorpe, and W. Whittaker, Wang, “ControlShell: A Software Architecture for Complex
“Exploring Mount Erebus by walking robot,” Robotics and Electromechanical Systems,” The International Journal of
Autonomous Systems, 11(3-4), pp. 171-185, December Robotics Research, 17(4), pp. 360-380, April 1998.
1993. [22] D. Musliner, E. Durfee, and K. Shin. “Circa: A
[7] D. Bapna, E.Rollins, J. Murphy, E. Maimone, W. cooperative, intelligent, real-time control architecture.”
Whittaker, and D. Wettergreen, “The Atacama Desert trek: IEEE Transactions on Systems, Man, and Cybernetics,
Outcomes,” IEEE International Conference on Robotics and 23(6), 1993.
Automation, pp. 597-604, May 1998. [23] D. C. MacKenzie and R. C. Arkin, “Evaluating the
[8] T. A. Linden and J. Glickman, “Contingency Planning Usability of Robot Programming Toolsets,” The
for an Autonomous Land Vehicle,” Proceedings of IJCAI- International Journal of Robotics Research, 17(4), pp 381-
87, 1987. 401, April 1998.
[9] S. Thrun, D. Fox, and W. Burgard, “Probabilistic [24] N. Muscettola, P. P. Nayak, B. Pell, and B. C.
mapping of an environment by a mobile robot,” IEEE Williams. “Remote agent: To boldly go where no AI system
International Conference on Robotics and Automation, pp. has gone before.” Artificial Intelligence, 103(1/2), August
1546-1551, May 1998. 1998.
[10] B. Hayes-Roth, K. Pfleger, P. Lalanda, P. Morignot, [25] D. E. Bernard, G. A. Dorais, C. Fry, E. B. Gamble Jr.,
and M. Balabanovic, “A domain-specific software R. Kanefsky, J. Kurien, W. Millar, N. Muscettola, P. P.
architecture for adaptive intelligent systems,” IEEE Nayak, B. Pell, K. Rajan, N. Rouquette, B. Smith, and B. C.
Transactions on Software Engineering, 21(4), pp. 288-301, Williams, “Design of the remote agent experiment for
April 1995. spacecraft autonomy,” Proceedings of the IEEE Aerospace
[11] D. E. Wilkins, K. L. Myers, J. D. Lowrance, and L. P. Conference, Snowmass, CO, 1998.
Wesley, “Planning and Reacting in Uncertain and Dynamic [26] D. Draper, S. Hanks, and D. Weld. “Probabilistic
Environments,” Journal of Experimental and Theoretical AI, planning with information gathering and contingent
7(1), pp. 1970227, 1995. execution.” In Proc. 2nd Intl. Conf. AI Planning Systems,
[12] R. Simmons, “An architecture for coordinating June 1994.
planning, sensing and action,” in Innovative Approaches to [27] L. Pryor and G. Collins. “Planning for contingencies: A
Planning, Scheduling, and Control, pp. 292-297, 1990. decision-based approach.” J. Artificial Intelligence
[13] R. Alami, R. Chatila, S. Fleury, M. Ghallab, F. Ingrand, Research, 1996.
“An architecture for autonomy,” International Journal of [28] D. S. Weld, C. R. Anderson, and D. E. Smith.
Robotic Research, 17(4), pp 315-337, April 1998. “Extending Graphplan to handle uncertainty & sensing
[14] R. P. Bonasso, D. Kortenkamp, D. Miller, and M. actions.” In Proceedings of AAAI-98, pages 897--904,
Slack. “Experiences with an architecture for intelligent, 1998.
reactive agents.” In Proceedings of IJCAI-95, 1995. [29] M. Drummond, J. Bresina, and K. Swanson. “Just-in-
[15] D. B. Smith and J. R. Matijevic, “A System case scheduling.” In Proceedings of the 12th National
Architecture for a Planetary Rover.” Proceedings of the Conference on Artificial Intelligence, 1994.
NASA Conference on Space Telerobotics, JPL Publication [30] B. C. Williams and P. Nayak, “A Model-based
89-7, 1989. Approach to Reactive Self-Configuring Systems,”
[16] R. Chatila, S. Lacroix, T. Simeon, and M. Herrb. Proceedings of AAAI-96, 1996.
“Planetary exploration by a mobile robot: mission [31] B. C. Williams and P. P. Nayak (1997). ), “A reactive
teleprogramming and autonomous navigation.” Autonomous planner for a model-based executive”, Proceedings of
Robots, 2(4):333--344, 1995. IJCAI-97.
[17] G. Giralt and L. Boissier, “The French planetary rover [32] J. de Kleer and B. C. Williams, “Diagnosis With
VAP: concept and current developments.” Proceedings of Behavioral Modes,” Proceedings of IJCAI-89, 1989.
the IEEE International Conference on Intelligent Robots and [33] D. S. Weld and J. de Kleer, Readings in Qualitative
Systems (IROS ’92), pp. 1391-1398, 1992. Reasoning About Physical Systems, Morgan Kaufmann
18 Brooks Publishers, Inc., San Mateo, California, 1990.
[19] E. Gat, R. Desai, R. Ivlev, J. Loch, and D. Miller. [34] J. de Kleer and B. C. Williams, Artificial Intelligence,
“Behavior control for robotic exploration of planetary Volume 51, Elsevier, 1991.
surfaces.” IEEE Transactions on Robotics and Automation, [35] J. de Kleer and B. C. Williams, “Diagnosing Multiple
10(4), August 1994. Faults,” Artificial Intelligence, Vol 32, Number 1, 1987.
1996 Pacific Regional ACM programming contest. He is a
Richard Washington is Research Scientist in the member of AAAI.
Computational Sciences Division at NASA Ames Research
Center. He holds a Pd.D. in Computer Science from Trey Smith is a member of the CMU Field Robotics Center.
Stanford University. His research interests include He holds a B.S. in Computer Science from Carnegie Mellon
planning, plan execution, reasoning under uncertainty and University. He is working on the CMU Mars Autonomy
autonomous robots. He is a member of AAAI and ACM. Project. His primary research interests are robotics and
artificial intelligence.
Keith Golden is a Research Scientist in the Computational
Sciences Division at NASA Ames Research Center. He
holds a Pd.D. in Computer Science from the University of
Washington. His research interests include planning,
knowledge representation, sensing and information
gathering, software agents and rover autonomy. He is a
member of AAAI.