Aws ML
Aws ML
Aws ML
Terminology
PDF
AWS DeepRacer builds on the following concepts and uses the following
terminology.
The virtual race car can take the form of the original AWS DeepRacer
device, the Evo device, or various digital rewards that can be earned by
participating in AWS DeepRacer League Virtual Circuit races. You can also
customize the virtual car by changing its color.
The original AWS DeepRacer device is a physical 1/18th-scale model car.
It has a mounted camera and an on-board compute module. The compute
module runs inference in order to drive itself along a track. The compute
module and the vehicle chassis are powered by dedicated batteries known as
the compute battery and the drive battery, respectively.
The AWS DeepRacer Evo device is the original device with an optional
sensor kit. The kit includes an additional camera and LIDAR (light
detection and ranging), which allow the car to detect objects behind and
lateral to itself. The kit also includes a new shell.
Reinforcement learning
Reinforcement learning is a machine learning method that is focused on
autonomous decision-making by an agent in order to achieve specified goals
through interactions with an environment. In reinforcement learning, learning is
achieved through trial and error and training does not require labeled input.
Training relies on the reward hypothesis, which posits that all goals can be
achieved by maximizing a future reward after action sequences. In reinforcement
learning, designing the reward function is important. Better-crafted reward
functions result in better decisions frombetter the agent can decide what actions to
take to reach the goal.
For autonomous racing, the agent is a vehicle. The environment includes traveling
routes and traffic conditions. The goal is for the vehicle to reach its destination
quickly without accidents. Rewards are scores used to encourage safe and speedy
travel to the destination. The scores penalize dangerous and wasteful driving.
1. Start the agent to follow the current policy. The agent explores the
environment in a number of episodes and creates training data. This data
generation is an iterative process itself.
2. Apply the new training data to compute new policy gradients. Update the
network weights and continue training. Repeat Step 1 until a stop condition
is met.
Each training job produces a trained model and outputs the model artifacts to a
specified data store.
Evaluation job
An evaluation job is a workload that tests the performance of a model. Performance
is measured by given metrics after the training job is done. The standard AWS
DeepRacer performance metric is the driving time that an agent takes to complete a
lap on a track. Another metric is the percentage of the lap completed.
Racing Event Terminology
AWS DeepRacer racing events use the following concepts and terminology.
League/Competition
In the context of AWS DeepRacer League events, the
terms league and competition relate to the competition structure. AWS sponsors the
AWS DeepRacer League, which means we own it, design it, and execute it. A
competition has a start and end date.
Season
A competition can repeat in subsequent years. We call these different seasons (for
example, the 2019 season or 2020 season). Rules can change from season to
season, but are typically consistent within a season. Terms and conditions for the
AWS DeepRacer League can vary from season to season.
The Virtual Circuit
The Virtual Circuit refers to the online races happening in the AWS DeepRacer
console.
Event
As defined by the rules, an event is an AWS DeepRacer League occurrence in
which you can participate in a race. An event has a start and end date. Virtual
Circuit events typically last one month. There can be many events in a season, and
some rules—such as how we rank those participating in an event, select who wins,
and what happens thereafter—are subject to change.
Race type
In the Virtual Circuit, Open Division racers can race in time-trial (TT) races and
Pro Division racers can race in object-avoidance (OA) or head-to-head (H2H)
races. Each race type may also specify the number of laps, how racers are ranked,
and so on.
League Divisions
The AWS DeepRacer League's Virtual Circuit monthly leaderboard is split into two
skill-based divisions, Pro and Open. Each division offers its own racing formats
and opportunities for prizes to maintain a high level of overall competitiveness.
Open Division
All racers begin their machine learning journey in the Open Division. Open
Division racers compete in the time trial format and receive monthly digital
rewards for participation.
Pro Division
The Pro Division is for racers who have earned a top 10% time trial result from the
previous month. Racers in the Pro Division earn bigger rewards and can compete in
the monthly finale for qualifying seats for the yearly AWS re:Invent Championship
Cup. Pro Division racers compete in complex racing formats, such as head-to-head
or object-avoidance races, which require stereo camera or LiDAR sensor
configurations.
Train and Evaluate AWS DeepRacer
Models Using the AWS DeepRacer
Console
PDF
To train a reinforcement learning model, you can use the AWS DeepRacer console. In
the console, create a training job, choose a supported framework and an available
algorithm, add a reward function, and configure training settings. You can also watch
training proceed in a simulator. You can find the step-by-step instructions in Train
Your First AWS DeepRacer Model .
This section explains how to train and evaluate an AWS DeepRacer model. It also
shows how to create and improve a reward function, how an action space affects
model performance, and how hyperparameters affect training performance. You can
also learn how to clone a training model to extend a training session, how to use the
simulator to evaluate training performance, and how to address some of the
simulation to real-world challenges.
Topics
In general, you design your reward function to act like an incentive plan. Different
incentive strategies could result in different vehicle behaviors. To make the vehicle
drive faster, the function should give rewards for the vehicle to follow the track. The
function should dispense penalties when the vehicle takes too long to finish a lap or
goes off the track. To avoid zig-zag driving patterns, it could reward the vehicle to
steer less on straighter portions of the track. The reward function might give positive
scores when the vehicle passes certain milestones, as measured by waypoints. This
could alleviate waiting or driving in the wrong direction. It is also likely that you would
change the reward function to account for the track conditions. However, the more
your reward function takes into account environment-specific information, the more
likely your trained model is over-fitted and less general. To make your model more
generally applicable, you can explore action space.
A good practice to create a reward function is to start with a simple one that covers
basic scenarios. You can enhance the function to handle more actions. Let's now
look at some simple reward functions.
Topics
def reward_function(params):
if not params["all_wheels_on_track"]:
reward = -1
else if params["progress"] == 1 :
reward = 10
return reward
This logic penalizes the agent when it drives itself off the track. It rewards the agent
when it drives to the finishing line. It's reasonable for achieving the stated goal.
However, the agent roams freely between the starting point and the finishing line,
including driving backwards on the track. Not only could the training take a long time
to complete, but also the trained model would lead to less efficient driving when
deployed to a real-world vehicle.
def reward_function(params):
if not params["all_wheels_on_track"]:
reward = -1
else:
reward = params["progress"]
return reward
With this function, the agent gets more reward the closer it reaches the finishing line.
This should reduce or eliminate unproductive trials of driving backwards. In general,
we want the reward function to distribute the reward more evenly over the action
space. Creating an effective reward function can be a challenging undertaking. You
should start with a simple one and progressively enhance or improve the function.
With systematic experimentation, the function can become more robust and
efficient.
Enhance Your Reward Function
After you have successfully trained your AWS DeepRacer model for the simple
straight track, the AWS DeepRacer vehicle (virtual or physical) can drive itself without
going off the track. If you let the vehicle run on a looped track, it won't stay on the
track. The reward function has ignored the actions to make turns to follow the track.
To make your vehicle handle those actions, you must enhance the reward function.
The function must give a reward when the agent makes a permissible turn and
produce a penalty if the agent makes an illegal turn. Then, you're ready to start
another round of training. To take advantage of the prior training, you can start the
new training by cloning the previously trained model, passing along the previously
learned knowledge. You can follow this pattern to gradually add more features to the
reward function to train your AWS DeepRacer vehicle to drive in increasingly more
complex environments.
How would you go about training a model as robust as possible while keeping the
reward function as simple as possible? One way is to explore the action space
spanning the actions your agent can take. Another is to experiment
with hyperparameters of underlying training algorithm. Often times, you do both.
Here, we focus on how to explore the action space to train a robust model for your
AWS DeepRacer vehicle.
...
a(i-1)*n+j: (vi, sj)
...
The actual values of (vi, sj) depend on the ranges of vmax and |smax| and are not
uniformly distributed.
Each time you begin training or iterating your AWS DeepRacer model, you must first
specify the n, m, vmax and |smax| or agree to using their default values. Based on your
choice, the AWS DeepRacer service generates the available actions your agent can
choose in training. The generated actions are not uniformly distributed over the
action space.
In general, a larger number of actions and larger action ranges give your agent more
room or options to react to more varied track conditions, such as a curved track with
irregular turning angles or directions. The more options available to the agent, the
more readily it can handle track variations. As a result, you can expect that the
trained model to be more widely applicable, even when using a simple reward
function .
For example, your agent can learn quickly to handle straight-line track using a
coarse-grained action space with small number of speeds and steering angles. On a
curved track, this coarse-grained action space is likely to cause the agent to
overshoot and go off the track while it turns. This is because there are not enough
options at its disposal in order to adjust its speed or steering. Increase the number
of speeds or the number of steering angles or both, the agent should become more
capable of maneuvering the curves while keeping on the track. Similarly, if your agent
moves in a zig-zag fashion, you can try to increase the number of steering ranges to
reduce drastic turns at any given step.
When the action space is too large, training performance may suffer, because it
takes longer to explore the action space. Be sure to balance the benefits of a model's
general applicability against its training performance requirements. This
optimization involves systematic experimentation.
The variables affecting the training process are known as hyperparameters of the
training. These algorithm attributes are not properties of the underlying model.
Unfortunately, hyperparameters are empirical in nature. Their optimal values are not
known for all practical purposes and require systematic experimentation to derive.
Before discussing the hyperparameters that can be adjusted to tune the
performance of training your AWS DeepRacer model, let's define the following
terminology.
Data point
An episode is a period in which the vehicle starts from a given starting point and
ends up completing the track or going off the track. It embodies a sequence of
experiences. Different episodes can have different lengths.
Experience buffer
A training data is a set of batches sampled at random from an experience buffer and
used for training the policy network weights.
Algorithmic
hyperparameters and
their effects
Hyperparameters Description
Gradient descent batch The number recent vehicle experiences sampled at random from an
size experience buffer and used for updating the underlying deep-learning
neural network weights. Random sampling helps reduce correlations
inherent in the input data. Use a larger batch size to promote more
stable and smooth updates to the neural network weights, but be aware
of the possibility that the training may be longer or slower.
Algorithmic
hyperparameters and
their effects
Hyperparameters Description
Required
Yes
Valid values
Positive integer of (32, 64, 128, 256, 512)
Default value
64
Number of epochs The number of passes through the training data to update the neural
network weights during gradient descent. The training data corresponds
to random samples from the experience buffer. Use a larger number of
epochs to promote more stable updates, but expect a slower training.
When the batch size is small, you can use a smaller number of epochs
Required
No
Valid values
Positive integer between [3 - 10]
Default value
3
Learning rate During each update, a portion of the new weight can be from the
gradient-descent (or ascent) contribution and the rest from the
existing weight value. The learning rate controls how much a
gradient-descent (or ascent) update contributes to the network
weights. Use a higher learning rate to include more gradient-
descent contributions for faster training, but be aware of the
possibility that the expected reward may not converge if the
learning rate is too large.
Required
No
Valid values
Real number between 0.00000001 (or 10-8)
and 0.001 (or 10-3)
Algorithmic
hyperparameters and
their effects
Hyperparameters Description
Default value
0.0003
Entropy A degree of uncertainty used to determine when to add randomness
to the policy distribution. The added uncertainty helps the AWS
DeepRacer vehicle explore the action space more broadly. A larger
entropy value encourages the vehicle to explore the action space
more thoroughly.
Required
No
Valid values
Real number between 0 and 1.
Default value
0.01
Discount factor A factor specifies how much of the future rewards contribute to the
expected reward. The larger the Discount factor value is, the
farther out contributions the vehicle considers to make a move and
the slower the training. With the discount factor of 0.9, the vehicle
includes rewards from an order of 10 future steps to make a move.
With the discount factor of 0.999, the vehicle considers rewards
from an order of 1000 future steps to make a move. The
recommended discount factor values are 0.99, 0.999 and 0.9999.
Required
No
Valid values
Real number between 0 and 1.
Default value
0.999
Loss type Type of the objective function used to update the network weights.
A good training algorithm should make incremental changes to the
agent's strategy so that it gradually transitions from taking random
Algorithmic
hyperparameters and
their effects
Hyperparameters Description
Required
No
Valid values
(Huber loss, Mean squared error loss)
Default value
Huber loss
Number of experience The size of the experience buffer used to draw training data from for
episodes between each learning policy network weights. An experience episode is a period in
policy-updating iteration which the agent starts from a given starting point and ends up
completing the track or going off the track. It consists of a sequence of
experiences. Different episodes can have different lengths. For simple
reinforcement-learning problems, a small experience buffer may be
sufficient and learning is fast. For more complex problems that have
more local maxima, a larger experience buffer is necessary to provide
more uncorrelated data points. In this case, training is slower but more
stable. The recommended values are 10, 20 and 40.
Required
No
Valid values
Integer between 5 and 100
Default value
20
Examine AWS DeepRacer Training Job Progress
After starting your training job, you can examine the training metrics of rewards and
track completion per episode to ascertain the training job's performance of your
model. On the AWS DeepRacer console, the metrics are displayed in the Reward
graph, as shown in the following illustration.
You can choose to view the reward gained per episode, the averaged reward per
iteration, the progress per episode, the averaged progress per iteration or any
combination of them. To do so, toggle the Reward (Episode, Average) or Progress
(Episode, Average) switches at the bottom of Reward graph. The reward and
progress per episode are displayed as scattered plots in different colors. The
averaged reward and track completion are displayed by line plots and start after the
first iteration.
The range of rewards is shown on the left side of the graph and the range of
progress (0-100) is on the right side. To read the exact value of of a training metric,
move the mouse near to the data point on the graph.
The graphs are automatically updated every 10 seconds while training is under way.
You can choose the refresh button to manually update the metric display.
A training job is good if the averaged reward and track completion show trends to
converge. In particular, the model has likely converged if the progress per episode
continuously reach 100% and the reward levels out. If not, clone the model and
retrain it.
In this section, you learn how to clone a trained model using the AWS DeepRacer
console.
To iterate training the reinforcement learning model using the AWS DeepRacer console
1. Sign in to the AWS DeepRacer console, if you're not already signed in.
2. On the Models page, choose a trained model and then choose Clone from
the Action drop-down menu list.
3. For Model details, do the following:
a. Type RL_model_1 in Model name, if you don't want a name to be generated
for the cloned model.
b. Optionally, give a description for the to-be-cloned model in Model
description - optional.
4. For Environment simulation, choose another track option.
5. For Reward function, choose one of the available reward function examples. Modify
the reward function. For example, consider steering.
6. Expand Algorithm settings and try different options. For example, change
the Gradient descent batch size value from 32 to 64 or increase the Learning
rate to speed up the training.
7. Experiment with difference choices of the Stop conditions.
8. Choose Start training to begin new round of training.
As with training a robust machine learning model in general, it is important that you
conduct systematic experimentation to come up with the best solution.
Test an AWS DeepRacer model with an AWS DeepRacer vehicle driving on a physical
track, see Operate Your AWS DeepRacer Vehicle .
The events are logged in job-specific log streams. For a training job, the log stream
appears under the /aws/sagemaker/TrainingJobs log group. For a simulation job, the
log stream appears under the /aws/robomaker/SimulationJobs log group. For an
evaluation job submitted to a leaderboard in the AWS DeepRacer League Virtual
Circuit, the log stream appears under
the /aws/deepracer/leaderboard/SimulationJobs log group. For the reward function
execution, the log stream appears under the /aws/lambda/AWS-DeepRacer-Test-Reward-
Function log group.
Most of the log entries are self-explanatory, except for those starting with
"SIM_TRACE_LOG". An example of this log entry is shown as follows:
SIM_TRACE_LOG:0,14,3.1729,0.6200,-0.2606,-
0.26,0.50,2,0.5000,False,True,1.4878,1,17.67,1563406790.240018
To access the AWS DeepRacer logs, you can use the CloudWatch console , the AWS
CLI or an AWS SDK.
...
{
"timestamp": 1563407217100,
"message": "SIM_TRACE_LOG:39,218,5.6879,0.3078,-
0.1135,0.52,1.00,9,0.0000,True,False,20.7185,9,17.67,1563407216.6946
22",
"ingestionTime": 1563407217108
},
{
"timestamp": 1563407218143,
"message": "Training> Name=main_level/agent, Worker=0,
Episode=40, Total reward=61.93, Steps=4315, Training iteration=0",
"ingestionTime": 1563407218150
}
],
"nextForwardToken":
"f/34865146013350625778794700014105997464971505654143647744",
"nextBackwardToken":
"b/34865137118854508561245373892407536877673471318173089813"
}
To view AWS DeepRacer logs in the CloudWatch Logs console:
To help quickly find the AWS DeepRacer-specific event logs, type one of the
aforementioned log group names in the Filter box.
4. Choose a log stream to open the log file.
To quickly locate the most recent log stream in a given log group, sort the list by Last
Event Time.
Your AWS DeepRacer vehicle comes with a pre-trained model loaded into its
inference engine. Before testing your own model in the real world, verify that the
vehicle performs reasonably well with the default model. If not, check the physical
track setup. Testing a model in an incorrectly built physical track is likely to lead to a
poor performance. In such cases, reconfigure or repair your track before starting or
resuming testing.
Note
When running your AWS DeepRacer vehicle, actions are inferred according to the
trained policy network without invoking the reward function.
If your model doesn't work well in the real world, it's possible that either the model or
track is defective. To sort out the root causes, you should first evaluate the model in
simulations to check if the simulated agent can finish at least one loop without getting
off the track. You can do so by inspecting the convergence of the rewards while
observing the agent's trajectory in the simulator. If the reward reaches the maximum
when the simulated agents completes a loop without faltering, the model is likely to
be a good one.
Do not over train the model.
Continuing training after the model has consistently completed the track in simulation
will cause overfitting in the model. An over-trained model won't perform well in the
real world because it can't handle even minor variations between the simulated track
and the real environment.
Use multiple models from different iterations.
A typical training session produces a range of models that fall between being
underfitted and being overfitted. Because there are no a priori criteria to determine a
model that is just right, you should pick a few model candidates from the time when
the agent completes a single loop in the simulator to the point where it performs loops
consistently.
Start slow and increase the driving speed gradually in testing.
When testing the model deployed to your vehicle, start with a small maximum speed
value. For example, you can set the testing speed limit to be <10% of the training
speed limit. Then gradually increase the testing speed limit until the vehicle starts
moving. You set the testing speed limit when calibrating the vehicle using the device
control console. If the vehicle goes too fast, i.e. the speed exceeds those seen during
training in simulator, the model is not likely to perform well on the real track.
Test a model with your vehicle in different starting positions.
The model learns to take a certain path in simulation and can be sensitive to its
position within the track. You should start the vehicle tests with different positions
within the track boundaries (from left to center to right) to see if the model performs
well from certain positions. Most models tend to make the vehicle stay close to either
side of one of the white lines. To help analyze the vehicle's path, plot the vehicle's
positions (x, y) step by step from the simulation to identify likely paths to be taken by
your vehicle in a real environment.
Start testing with a straight track.
A straight track is much easier to navigate compared to a curved track. Starting your
test with a straight track is useful to weed out poor models quickly. If a vehicle cannot
follow a straight track most of the time, the model will not perform well on curved
tracks, either.
Watch out for the behavior where the vehicle takes only one type of actions,
When your vehicle can manage to take only one type of actions, e.g., to steer the
vehicle to the left only, the model is likely over-fitted or under-fitted. With given
model parameters, too many iterations in training could make the model over-fitted.
Too few iterations could make it under-fitted.
Watch out for vehicle's ability to correct its path along a track border.
A good model makes the vehicle to correct itself when nearing the track borders.
Most well-trained models have this capability. If the vehicle can correct itself on both
the track borders, the model is considered to be more robust and of a higher quality.
Watch out for inconsistent behaviors exhibited by the vehicle.
If the vehicle takes left turns very well but fails to manage steering right, or, similarly,
if the vehicle takes only right turns well, but not left steering, you need to carefully
calibrate or recalibrate your vehicle's steering. Alternatively, you can try to use a
model that is trained with the settings close to the physical settings under testing.
Watch out for the vehicle's making sudden turns and go off-track.
If the vehicle follows the path correctly most of the way, but suddenly veers off the
track, it is likely due to distractions in the environment. Most common distractions
include unexpected or unintended light reflections. In such cases, use barriers around
the track or other means to reduce glaring lights.