Quiz: Reinforcement Learning: Question 1 of 5
Quiz: Reinforcement Learning: Question 1 of 5
Quiz: Reinforcement Learning: Question 1 of 5
QUESTION 1 OF 5
Supervised learning
Unsupervised learning
SUBMIT
QUESTION 2 OF 5
The piece of software you are training that makes decisions in an environment to
reach a goal.
QUESTION 3 OF 5
TRUE
SUBMIT
QUESTION 4 OF 5
If an agent doesn't explore enough, it often sticks to information its already learned
even if this knowledge doesn't help the agent achieve its goal.
The agent can use information from previous experiences to help it make future
decisions that enable it to reach its goal.
The reward function gives an agent all the information it needs to reach its
goal.
SUBMIT
QUESTION 5 OF 5
NEXT
Instructions
Train a model in the AWS DeepRacer console and interpret its reward graph.
Part 1: Train a reinforcement learning model using the AWS
DeepRacer console
Practice the knowledge you've learned by training your first reinforcement learning
model using the AWS DeepRacer console.
1. If this is your first time using AWS DeepRacer, choose Get started from the
service landing page, or choose Get started with reinforcement
learning from the main navigation pane.
2. On the Get started with reinforcement learning page, under Step 2:
Create a model and race, choose Create model. Alternatively, on the AWS
DeepRacer home page, choose Your models from the main navigation pane
to open the Your models page. On the Your models page, choose Create
model.
3. On the Create model page, under Environment simulation, choose a track
as a virtual environment to train your AWS DeepRacer agent. Then,
choose Next. For your first run, choose a track with a simple shape and
smooth turns. In later iterations, you can choose more complex tracks to
progressively improve your models. To train a model for a particular racing
event, choose the track most similar to the event track.
4. On the Create model page, choose Next.
5. On the Create Model page, under Race type, choose a training type. For
your first run, choose Time trial. The agent with the default sensor
configuration with a single-lens camera is suitable for this type of racing
without modifications.
6. On the Create model page, under Training algorithm and
hyperparameters, choose the Soft Actor Critic (SAC) or Proximal Policy
Optimization (PPO) algorithm. In the AWS DeepRacer console, SAC models
must be trained in continuous action spaces. PPO models can be trained in
either continuous or discrete action spaces.
7. On the Create model page, under Training algorithm and
hyperparameters, use the default hyperparameter values as is. Later on, to
improve training performance, expand the hyperparameters and experiment
with modifying the default hyperparameter values.
8. On the Create model page, under Agent, choose The Original
DeepRacer or The Original DeepRacer (continuous action space) for your
first model. If you use Soft Actor Critic (SAC) as your training algorithm, we
filter your cars so that you can conveniently choose from a selection of
compatible continuous action space agents.
9. On the Create model page, choose Next.
10. On the Create model page, under Reward function, use the default reward
function example as is for your first model. Later on, you can choose Reward
function examples to select another example function and then choose Use
code to accept the selected reward function.
11. On the Create model page, under Stop conditions, leave the
default Maximum time value as is or set a new value to terminate the
training job to help prevent long-running (and possible run-away) training
jobs. When experimenting in the early phase of training, you should start
with a small value for this parameter and then progressively train for longer
amounts of time.
12. On the Create model page, choose Create model to start creating the model
and provisioning the training job instance.
13. After the submission, watch your training job being initialized and then run.
The initialization process takes about 6 minutes to change status
from Initializing to In progress.
14. Watch the Reward graph and Simulation video stream to observe the
progress of your training job. You can choose the refresh button next
to Reward graph periodically to refresh the Reward graph until the training
job is complete.
Note: The training job is running on the AWS Cloud, so you don't need to keep the
AWS DeepRacer console open during training. However, you can come back to the
console to check on your model at any point while the job is in progress.
Exercise Solution
To get a sense of how well your training is going, watch the reward graph. Here is a
list of its parts and what they do:
Average reward
This graph represents the average reward the agent earns during a
training iteration. The average is calculated by averaging the reward
earned across all episodes in the training iteration. An episode begins
at the starting line and ends when the agent completes one loop
around the track or at the place the vehicle left the track or collided
with an object. Toggle the switch to hide this data.
Average percentage completion (training)
The training graph represents the average percentage of the track
completed by the agent in all training episodes in the current training.
It shows the performance of the vehicle while experience is being
gathered.
Average percentage completion (evaluation)
While the model is being updated, the performance of the existing
model is evaluated. The evaluation graph line is the average
percentage of the track completed by the agent in all episodes run
during the evaluation period.
Best model line
This line allows you to see which of your model iterations had the
highest average progress during the evaluation. The checkpoint for
this iteration will be stored. A checkpoint is a snapshot of a model that
is captured after each training (policy-updating) iteration.
Reward primary y-axis
This shows the reward earned during a training iteration. To read the
exact value of a reward, hover your mouse over the data point on the
graph.
Percentage track completion secondary y-axis
This shows you the percentage of the track the agent completed
during a training iteration.
Iteration x-axis
This shows the number of iterations completed during your training
job.
List of reward graph parts and what they do
No improvement
In the next example, we can see that the percentage of track completions haven’t
gone above around 15 percent and it's been training for quite some time—probably
around 6000 iterations or so. This is not a good sign! Consider throwing this model
and reward function away and trying a different strategy.
No improvement
A well-trained model
In the following example graph, we see the evaluation percentage completion
reached 100% a while ago, and the training percentage reached 100% roughly 100
or so iterations ago. At this point, the model is well trained. Training it further might
lead to the model becoming overfit to this track.
Avoid overfitting
Overfitting or overtraining is a really important concept in machine learning. With
AWS DeepRacer, this can become an issue when a model is trained on a specific
track for too long. A good model should be able to make decisions based on the
features of the road, such as the sidelines and centerlines, and be able to drive on
just about any track.
An overtrained model, on the other hand, learns to navigate using landmarks
specific to an individual track. For example, the agent turns a certain direction when
it sees uniquely shaped grass in the background or a specific angle the corner of the
wall makes. The resulting model will run beautifully on that specific track, but
perform badly on a different virtual track, or even on the same track in a physical
environment due to slight variations in angles, textures, and lighting.
Well-trained - Avoid overfitting
Adjust hyperparameters
The AWS DeepRacer console's default hyperparameters are quite effective, but
occasionally you may consider adjusting the training hyperparameters.
The hyperparameters are variables that essentially act as settings for the training
algorithm that control the performance of your agent during training. We learned,
for example, that the learning rate controls how many new experiences are
counted in learning at each step.
In this reward graph example, the training completion graph and the reward graph
are swinging high and low. This might suggest an inability to converge, which may
be helped by adjusting the learning rate. Imagine if the current weight for a given
node is .03, and the optimal weight should be .035, but your learning rate was set to
.01. The next training iteration would then swing past optimal to .04, and the
following iteration would swing under it to .03 again. If you suspect this, you can
reduce the learning rate to .001. A lower learning rate makes learning take longer
but can help increase the quality of your model.
Adjust hyperparameters
Introduction to Generative AI
A generative model aims to answer the question,"Have I seen data like this before?"
In our image classification example, we might still use a generative model by
framing the problem in terms of whether an image with the label "cat" is more
similar to data you’ve seen before than an image with the label "no cat."
However, generative models can be used to support a second use case. The
patterns learned in generative models can be used to create brand new examples of
data which look similar to the data it seen before.
Discriminative versus Generative algorithms
Generative AI Models
In this lesson, you will learn how to create three popular types of generative
models: generative adversarial networks (GANs), general autoregressive
models, and transformer-based models. Each of these is accessible through AWS
DeepComposer to give you hands-on experience with using these techniques to
generate new examples of music.
Autoregressive models
Autoregressive convolutional neural networks (AR-CNNs) are used to study systems
that evolve over time and assume that the likelihood of some data depends only on
what has happened in the past. It’s a useful way of looking at many systems, from
weather prediction to stock prediction.
Transformer-based models
Transformer-based models are most often used to study data with some sequential
structure (such as the sequence of words in a sentence). Transformer-based
methods are now a common modern tool for modeling natural language.
We won't cover this approach in this course but you can learn more about
transformers and how AWS DeepComposer uses transformers in AWS
DeepComposer learning capsules.
NEXT
AWS DeepComposer
Summary
For the input track, you can use a sample track, record a custom track, or import a
track.
Input track
For the ML technique, you can use either a sample model or a custom model.
Each AWS DeepComposer Music studio experience supports three different
generative AI techniques: generative adversarial networks (GANs), autoregressive
convolutional neural network (AR-CNNs), and transformers.
Summary
In this demo, you went through the AWS DeepComposer console where you can
learn about deep learning, input your music, and train deep learning models to
create new music.
You don't need to participate in this challenge to finish this course, but the course
teaches everything you need to win in both challenges we launched this season.
Regardless of your background in music or ML, you can find a competition just right
for you.
In the Basic challenge, “Melody-Go-Round”, you can use any machine learning
technique in the AWS DeepComposer Music studio to create new
compositions.
In the Advanced challenge, “Melody Harvest”, you train a custom generative
AI model using Amazon SageMaker.
NEXT
;
GANs with AWS DeepComposer
Summary
We’ll begin our journey of popular generative models in AWS DeepComposer with
generative adversarial networks or GANs. Within an AWS DeepComposer GAN,
models are used to solve a creative task: adding accompaniments that match the
style of an input track you provide. Listen to the input melody and the output
composition created by the AWS DeepComposer GAN model:
Input melody
Output melody
A generator is a neural network that learns to create new data resembling the
source data on which it was trained.
A discriminator is another neural network trained to differentiate between
real and synthetic data.
The generator and the discriminator are trained in alternating cycles. The generator
learns to produce more and more realistic data while the discriminator iteratively
gets better at learning to differentiate real data from the newly created data.
The GAN models that AWS DeepComposer uses work in a similar fashion. There are
two competing networks working together to learn how to generate musical
compositions in distinctive styles.
A GAN's generator produces new music as the orchestra does. And the
discriminator judges whether the music generator creates is realistic and provides
feedback on how to make its data more realistic, just as a conductor provides
feedback to make an orchestra sound better.
An orchestra and its conductor
Training Methodology
Let’s dig one level deeper by looking at how GANs are trained and used within AWS
DeepComposer. During training, the generator and discriminator work in a tight
loop as depicted in the following image.
Generator
The generator takes in a batch of single-track piano rolls (melody) as the
input and generates a batch of multi-track piano rolls as the output by adding
accompaniments to each of the input music tracks.
The discriminator then takes these generated music tracks and predicts how
far they deviate from the real data present in the training dataset. This
deviation is called the generator loss. This feedback from the discriminator is
used by the generator to incrementally get better at creating realistic output.
Discriminator
As the generator gets better at creating music accompaniments, it begins
fooling the discriminator. So, the discriminator needs to be retrained as well.
The discriminator measures the discriminator loss to evaluate how well it is
differentiating between real and fake data.
Beginning with the discriminator on the first iteration, we alternate training these
two networks until we reach some stop condition; for example, the algorithm has
seen the entire dataset a certain number of times or the generator and
discriminator loss reach some plateau (as shown in the following image).
New Terms
Generator: A neural network that learns to create new data resembling the
source data on which it was trained.
Discriminator: A neural network trained to differentiate between real and
synthetic data.
Generator loss: Measures how far the output data deviates from the real
data present in the training dataset.
Discriminator loss: Evaluates how well the discriminator differentiates
between real and fake data.
Supporting Materials
Input Twinkle Twinkle Input
Output Twinkle Twinkle Rock
NEXT
Quiz: Generative AI
QUESTION 1 OF 4
"Edit event" refers to a note added to the input track during inference.
QUESTION 2 OF 4
The generator learns to produce more realistic data and the discriminator learns to
differentiate real data from the newly created data.
The discriminator learns from both real Bach music and realistic Bach music.
The generator is responsible for both creating new music and providing
feedback.
SUBMIT
QUESTION 3 OF 4
QUESTION 4 OF 4
False
SUBMIT
NEXT
Important
To get you started, AWS DeepComposer provides a 12-month Free Tier for
first-time users. With the Free Tier, you can perform up to 500 inference
jobs, translating to 500 pieces of music, using the AWS DeepComposer Music
studio. You can use one of these instances to complete the exercise at no
cost. For more information, please read the AWS account requirements page.
Demo Part 1:
Demo Part 2:
Summary
In the demo, you have learned how to create music using AWS Deepcomposer.
You will need a music track to get started. There are several ways to do it. You can
record your own using the AWS keyboard device or the virtual keyboard provided in
the console. Or you can input a MIDI file or choose a provided music track.
Once the music track is inputted, choose "Continue" to create a model. The models
you can choose are AR-CNN, GAN, and transformers. Each of them has a slightly
different function. After choosing a model, you can then adjust the parameters used
to train the model.
Once you are done with model creation, you can select "Continue" to listen and
improve your output melody. To edit the melody, you can either drag or extend
notes directly on the piano roll or adjust the model parameters and train it again.
Keep tuning your melody until you are happy with it then click "Continue" to finish
the composition.
If you want to enhance your music further with another generative model, you can
do it too. Simply choose a model under the "Next step" section and create a new
model to enhance your music.