Quiz: Reinforcement Learning: Question 1 of 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

Quiz: Reinforcement Learning

QUESTION 1 OF 5

In which type of machine learning are models trained using labeled


data?

Reinforcement learning

 Supervised learning

Unsupervised learning
SUBMIT

QUESTION 2 OF 5

In reinforcement learning, what is an "agent"?



A tool for designing an incentive plan which specifies which actions will be
rewarded.

 The piece of software you are training that makes decisions in an environment to
reach a goal.

A popular supervised learning technique that is used to predict continuous


values in house prices.
SUBMIT

QUESTION 3 OF 5

TRUE or FALSE: In reinforcement learning, "Exploration" is using


experience to decide.
 FALSE

TRUE
SUBMIT

QUESTION 4 OF 5

How does a balance of "Exploration" and "Exploitation" help a


reinforcement learning model?
 The more an agent learns about its environment, the more confident it becomes
about the actions it chooses.

 If an agent doesn't explore enough, it often sticks to information its already learned
even if this knowledge doesn't help the agent achieve its goal.
 The agent can use information from previous experiences to help it make future
decisions that enable it to reach its goal.

An agent should not explore an environment because random actions just


lead to low reward.

The reward function gives an agent all the information it needs to reach its
goal.
SUBMIT

QUESTION 5 OF 5

Submit to check your answer choices!


TERM
DEFINITION
State
The current position within the environment that is visible, or known, to an agent.
Action
For every state, an agent needs to do this toward achieving its goal.
Episode
Represents a period of trial and error when an agent makes decisions and gets
feedback from its environment.
Reward
Feedback given to an agent for each action it takes in a given state.
Environment
The surrounding area our agent interacts with.
SUBMIT

NEXT

Exercise: Interpret the reward graph of your first


AWS DeepRacer model

Instructions
Train a model in the AWS DeepRacer console and interpret its reward graph.
Part 1: Train a reinforcement learning model using the AWS
DeepRacer console
Practice the knowledge you've learned by training your first reinforcement learning
model using the AWS DeepRacer console.

1. If this is your first time using AWS DeepRacer, choose Get started from the
service landing page, or choose Get started with reinforcement
learning from the main navigation pane.
2. On the Get started with reinforcement learning page, under Step 2:
Create a model and race, choose Create model. Alternatively, on the AWS
DeepRacer home page, choose Your models from the main navigation pane
to open the Your models page. On the Your models page, choose Create
model.
3. On the Create model page, under Environment simulation, choose a track
as a virtual environment to train your AWS DeepRacer agent. Then,
choose Next. For your first run, choose a track with a simple shape and
smooth turns. In later iterations, you can choose more complex tracks to
progressively improve your models. To train a model for a particular racing
event, choose the track most similar to the event track.
4. On the Create model page, choose Next.
5. On the Create Model page, under Race type, choose a training type. For
your first run, choose Time trial. The agent with the default sensor
configuration with a single-lens camera is suitable for this type of racing
without modifications.
6. On the Create model page, under Training algorithm and
hyperparameters, choose the Soft Actor Critic (SAC) or Proximal Policy
Optimization (PPO) algorithm. In the AWS DeepRacer console, SAC models
must be trained in continuous action spaces. PPO models can be trained in
either continuous or discrete action spaces.
7. On the Create model page, under Training algorithm and
hyperparameters, use the default hyperparameter values as is. Later on, to
improve training performance, expand the hyperparameters and experiment
with modifying the default hyperparameter values.
8. On the Create model page, under Agent, choose The Original
DeepRacer or The Original DeepRacer (continuous action space) for your
first model. If you use Soft Actor Critic (SAC) as your training algorithm, we
filter your cars so that you can conveniently choose from a selection of
compatible continuous action space agents.
9. On the Create model page, choose Next.
10. On the Create model page, under Reward function, use the default reward
function example as is for your first model. Later on, you can choose Reward
function examples to select another example function and then choose Use
code to accept the selected reward function.
11. On the Create model page, under Stop conditions, leave the
default Maximum time value as is or set a new value to terminate the
training job to help prevent long-running (and possible run-away) training
jobs. When experimenting in the early phase of training, you should start
with a small value for this parameter and then progressively train for longer
amounts of time.
12. On the Create model page, choose Create model to start creating the model
and provisioning the training job instance.
13. After the submission, watch your training job being initialized and then run.
The initialization process takes about 6 minutes to change status
from Initializing to In progress.
14. Watch the Reward graph and Simulation video stream to observe the
progress of your training job. You can choose the refresh button next
to Reward graph periodically to refresh the Reward graph until the training
job is complete.
Note: The training job is running on the AWS Cloud, so you don't need to keep the
AWS DeepRacer console open during training. However, you can come back to the
console to check on your model at any point while the job is in progress.

Part 2: Inspect your reward graph to assess your training progress


As you train and evaluate your first model, you'll want to get a sense of its quality.
To do this, inspect your reward graph.

Find the following on your reward graph:


 Average reward
 Average percentage completion (training)
 Average percentage completion (evaluation)
 Best model line
 Reward primary y-axis
 Percentage track completion secondary y-axis
 Iteration x-axis
Review the solution to this exercise for ideas on how to interpret it.
As you train and evaluate your first model, you'll want to get a sense of its quality. To do
this, inspect your reward graph.

Exercise Solution
To get a sense of how well your training is going, watch the reward graph. Here is a
list of its parts and what they do:

 Average reward
 This graph represents the average reward the agent earns during a
training iteration. The average is calculated by averaging the reward
earned across all episodes in the training iteration. An episode begins
at the starting line and ends when the agent completes one loop
around the track or at the place the vehicle left the track or collided
with an object. Toggle the switch to hide this data.
 Average percentage completion (training)
 The training graph represents the average percentage of the track
completed by the agent in all training episodes in the current training.
It shows the performance of the vehicle while experience is being
gathered.
 Average percentage completion (evaluation)
 While the model is being updated, the performance of the existing
model is evaluated. The evaluation graph line is the average
percentage of the track completed by the agent in all episodes run
during the evaluation period.
 Best model line
 This line allows you to see which of your model iterations had the
highest average progress during the evaluation. The checkpoint for
this iteration will be stored. A checkpoint is a snapshot of a model that
is captured after each training (policy-updating) iteration.
 Reward primary y-axis
 This shows the reward earned during a training iteration. To read the
exact value of a reward, hover your mouse over the data point on the
graph.
 Percentage track completion secondary y-axis
 This shows you the percentage of the track the agent completed
during a training iteration.
 Iteration x-axis
 This shows the number of iterations completed during your training
job.
List of reward graph parts and what they do

Reward Graph Interpretation


The following four examples give you a sense of how to interpret the success of
your model based on the reward graph. Learning to read these graphs is as much of
an art as it is a science and takes time, but reviewing the following four examples
will give you a start.

Needs more training


In the following example, we see there have only been 600 iterations, and the
graphs are still going up. We see the evaluation completion percentage has just
reached 100%, which is a good sign but isn’t fully consistent yet, and the training
completion graph still has a ways to go. This reward function and model are
showing promise, but need more training time.

Needs more training

No improvement
In the next example, we can see that the percentage of track completions haven’t
gone above around 15 percent and it's been training for quite some time—probably
around 6000 iterations or so. This is not a good sign! Consider throwing this model
and reward function away and trying a different strategy.

No improvement

A well-trained model
In the following example graph, we see the evaluation percentage completion
reached 100% a while ago, and the training percentage reached 100% roughly 100
or so iterations ago. At this point, the model is well trained. Training it further might
lead to the model becoming overfit to this track.

Avoid overfitting
Overfitting or overtraining is a really important concept in machine learning. With
AWS DeepRacer, this can become an issue when a model is trained on a specific
track for too long. A good model should be able to make decisions based on the
features of the road, such as the sidelines and centerlines, and be able to drive on
just about any track.
An overtrained model, on the other hand, learns to navigate using landmarks
specific to an individual track. For example, the agent turns a certain direction when
it sees uniquely shaped grass in the background or a specific angle the corner of the
wall makes. The resulting model will run beautifully on that specific track, but
perform badly on a different virtual track, or even on the same track in a physical
environment due to slight variations in angles, textures, and lighting.
Well-trained - Avoid overfitting

Adjust hyperparameters
The AWS DeepRacer console's default hyperparameters are quite effective, but
occasionally you may consider adjusting the training hyperparameters.
The hyperparameters are variables that essentially act as settings for the training
algorithm that control the performance of your agent during training. We learned,
for example, that the learning rate controls how many new experiences are
counted in learning at each step.
In this reward graph example, the training completion graph and the reward graph
are swinging high and low. This might suggest an inability to converge, which may
be helped by adjusting the learning rate. Imagine if the current weight for a given
node is .03, and the optimal weight should be .035, but your learning rate was set to
.01. The next training iteration would then swing past optimal to .04, and the
following iteration would swing under it to .03 again. If you suspect this, you can
reduce the learning rate to .001. A lower learning rate makes learning take longer
but can help increase the quality of your model.
Adjust hyperparameters

Good Job and Good Luck!


Remember: training experience helps both model and reinforcement learning
practitioners become a better team. Enter your model in the monthly AWS
DeepRacer League races for chances to win prizes and glory while improving your
machine learning development skills!
NEXT

Introduction to Generative AI

Generative AI and Its Applications


Generative AI is one of the biggest recent advancements in artificial intelligence
because of its ability to create new things.

Until recently, the majority of machine learning applications were powered


by discriminative models.  A discriminative model aims to answer the question, "If I'm
looking at some data, how can I best classify this data or predict a value?" For
example, we could use discriminative models to detect if a camera was pointed at a
cat.
As we train this model over a collection of images (some of which contain cats and
others which do not), we expect the model to find patterns in images which help
make this prediction.

A generative model aims to answer the question,"Have I seen data like this before?"
In our image classification example, we might still use a generative model by
framing the problem in terms of whether an image with the label "cat" is more
similar to data you’ve seen before than an image with the label "no cat."
However, generative models can be used to support a second use case. The
patterns learned in generative models can be used to create brand new examples of
data which look similar to the data it seen before.
Discriminative versus Generative algorithms

Generative AI Models
In this lesson, you will learn how to create three popular types of generative
models: generative adversarial networks (GANs), general autoregressive
models, and transformer-based models. Each of these is accessible through AWS
DeepComposer to give you hands-on experience with using these techniques to
generate new examples of music.

Autoregressive models
Autoregressive convolutional neural networks (AR-CNNs)  are used to study systems
that evolve over time and assume that the likelihood of some data depends only on
what has happened in the past. It’s a useful way of looking at many systems, from
weather prediction to stock prediction.

Generative adversarial networks (GANs)


Generative adversarial networks (GANs), are a machine learning model format that
involves pitting two networks against each other to generate new content. The
training algorithm swaps back and forth between training a generator
network (responsible for producing new data) and a discriminator
network (responsible for measuring how closely the generator network’s data
represents the training dataset).

Transformer-based models
Transformer-based models are most often used to study data with some sequential
structure (such as the sequence of words in a sentence). Transformer-based
methods are now a common modern tool for modeling natural language.
We won't cover this approach in this course but you can learn more about
transformers and how AWS DeepComposer uses transformers in AWS
DeepComposer learning capsules.
NEXT

Generative AI with AWS DeepComposer

What is AWS DeepComposer?


AWS DeepComposer gives you a creative and easy way to get started with machine
learning (ML), specifically generative AI. It consists of a USB keyboard that connects
to your computer to input melody and the AWS DeepComposer console, which
includes AWS DeepComposer Music studio to generate music, learning capsules to
dive deep into generative AI models, and AWS DeepComposer Chartbusters
challenges to showcase your ML skills.

AWS DeepComposer

Summary

AWS DeepComposer keyboard


You don't need an AWS DeepComposer keyboard to finish this course. You can
import your own MIDI file, use one of the provided sample melodies, or use the
virtual keyboard in the AWS DeepComposer Music studio.

AWS DeepComposer music studio


To generate, create, and edit compositions with AWS DeepComposer, you use the
AWS DeepComposer Music studio. To get started, you need an input track and a
trained model.

For the input track, you can use a sample track, record a custom track, or import a
track.

Input track
For the ML technique, you can use either a sample model or a custom model.
Each AWS DeepComposer Music studio experience supports three different
generative AI techniques: generative adversarial networks (GANs), autoregressive
convolutional neural network (AR-CNNs), and transformers.

 Use the GAN technique to create accompaniment tracks.


 Use the AR-CNN technique to modify notes in your input track.
 Use the transformers technique to extend your input track by up to 30
seconds.
ML models

Demo: AWS DeepComposer

Summary
In this demo, you went through the AWS DeepComposer console where you can
learn about deep learning, input your music, and train deep learning models to
create new music.

AWS DeepComposer learning capsules


To learn the details behind generative AI and ML techniques used in AWS
DeepComposer you can use easy-to-consume, bite-sized learning capsules in the
AWS DeepComposer console.
AWS DeepComposer learning capsules

AWS DeepComposer Chartbusters challenges


Chartbusters is a global challenge where you can use AWS DeepComposer to create
original compositions and compete in monthly challenges to showcase your
machine learning and generative AI skills.

You don't need to participate in this challenge to finish this course, but the course
teaches everything you need to win in both challenges we launched this season.
Regardless of your background in music or ML, you can find a competition just right
for you.

You can choose between two different challenges this season:

 In the Basic challenge, “Melody-Go-Round”, you can use any machine learning
technique in the AWS DeepComposer Music studio to create new
compositions.
 In the Advanced challenge, “Melody Harvest”, you train a custom generative
AI model using Amazon SageMaker.
NEXT

;
GANs with AWS DeepComposer

Summary
We’ll begin our journey of popular generative models in AWS DeepComposer with
generative adversarial networks or GANs. Within an AWS DeepComposer GAN,
models are used to solve a creative task: adding accompaniments that match the
style of an input track you provide. Listen to the input melody and the output
composition created by the AWS DeepComposer GAN model:

 Input melody
 Output melody

What are GANs?


A GAN is a type of generative machine learning model which pits two neural
networks against each other to generate new content: a generator and a
discriminator.

 A generator is a neural network that learns to create new data resembling the
source data on which it was trained.
 A discriminator is another neural network trained to differentiate between
real and synthetic data.
The generator and the discriminator are trained in alternating cycles. The generator
learns to produce more and more realistic data while the discriminator iteratively
gets better at learning to differentiate real data from the newly created data.

Collaboration between an orchestra and its conductor


A simple metaphor of an orchestra and its conductor can be used to understand a
GAN. The orchestra trains, practices, and tries to generate polished music, and then
the conductor works with them, as both judge and coach. The conductor judges the
quality of the output and at the same time provides feedback to achieve a specific
style. The more they work together, the better the orchestra can perform.

The GAN models that AWS DeepComposer uses work in a similar fashion. There are
two competing networks working together to learn how to generate musical
compositions in distinctive styles.

A GAN's generator produces new music as the orchestra does. And the
discriminator judges whether the music generator creates is realistic and provides
feedback on how to make its data more realistic, just as a conductor provides
feedback to make an orchestra sound better.
An orchestra and its conductor

Training Methodology
Let’s dig one level deeper by looking at how GANs are trained and used within AWS
DeepComposer. During training, the generator and discriminator work in a tight
loop as depicted in the following image.

A schema representing a GAN model used within AWS DeepComposer


Note: While this figure shows the generator taking input on the left, GANs in general
can also generate new data without any input.

Generator
 The generator takes in a batch of single-track piano rolls (melody) as the
input and generates a batch of multi-track piano rolls as the output by adding
accompaniments to each of the input music tracks.
 The discriminator then takes these generated music tracks and predicts how
far they deviate from the real data present in the training dataset. This
deviation is called the generator loss. This feedback from the discriminator is
used by the generator to incrementally get better at creating realistic output.

Discriminator
 As the generator gets better at creating music accompaniments, it begins
fooling the discriminator. So, the discriminator needs to be retrained as well.
The discriminator measures the discriminator loss to evaluate how well it is
differentiating between real and fake data.
Beginning with the discriminator on the first iteration, we alternate training these
two networks until we reach some stop condition; for example, the algorithm has
seen the entire dataset a certain number of times or the generator and
discriminator loss reach some plateau (as shown in the following image).

Discriminator loss and generator loss reach a plateau

New Terms
 Generator: A neural network that learns to create new data resembling the
source data on which it was trained.
 Discriminator: A neural network trained to differentiate between real and
synthetic data.
 Generator loss: Measures how far the output data deviates from the real
data present in the training dataset.
 Discriminator loss: Evaluates how well the discriminator differentiates
between real and fake data.
Supporting Materials
 Input Twinkle Twinkle Input
 Output Twinkle Twinkle Rock
NEXT

Quiz: Generative AI
QUESTION 1 OF 4

Which is the following statements is false in the context of AR-CNNs?



2D images can be used to represent music.

AR-CNN generates output music iteratively over time.

 "Edit event" refers to a note added to the input track during inference.

Autoregressive models can be used to study weather forecasting.


SUBMIT

QUESTION 2 OF 4

Please identify which of the following statements are true about a


generative adversarial network (GAN). There may be more than one
correct answer.

The generator and discriminator both use source data only.

 The generator learns to produce more realistic data and the discriminator learns to
differentiate real data from the newly created data.
 The discriminator learns from both real Bach music and realistic Bach music.

The generator is responsible for both creating new music and providing
feedback.
SUBMIT

QUESTION 3 OF 4

Which model is responsible for each of these roles in generative AI?


Submit to check your answer choices!
ROLES
NAME
Evaluating the output quality
Discriminator
Creating new output
Generator
Providing feedback
Discriminator
SUBMIT

QUESTION 4 OF 4

True or false: Loss functions help us determine when to stop training a


model.
 True

False
SUBMIT

NEXT

Demo: Create Music with AWS DeepComposer


Below you find a video demonstrating how you can use AWS DeepComposer to
experiment with GANs and AR-CNN models.

Important
 To get you started, AWS DeepComposer provides a 12-month Free Tier for
first-time users. With the Free Tier, you can perform up to 500 inference
jobs, translating to 500 pieces of music, using the AWS DeepComposer Music
studio. You can use one of these instances to complete the exercise at no
cost. For more information, please read the AWS account requirements page.

Demo Part 1:

Demo Part 2:

Summary
In the demo, you have learned how to create music using AWS Deepcomposer.
You will need a music track to get started. There are several ways to do it. You can
record your own using the AWS keyboard device or the virtual keyboard provided in
the console. Or you can input a MIDI file or choose a provided music track.

Once the music track is inputted, choose "Continue" to create a model. The models
you can choose are AR-CNN, GAN, and transformers. Each of them has a slightly
different function. After choosing a model, you can then adjust the parameters used
to train the model.

Once you are done with model creation, you can select "Continue" to listen and
improve your output melody. To edit the melody, you can either drag or extend
notes directly on the piano roll or adjust the model parameters and train it again.
Keep tuning your melody until you are happy with it then click "Continue" to finish
the composition.

If you want to enhance your music further with another generative model, you can
do it too. Simply choose a model under the "Next step" section and create a new
model to enhance your music.

Congratulations on creating your first piece of music using AWS DeepComposer!


Now you can download the melody or submit it to a competition. Hope you enjoy
the journey of creating music with AWS DeepComposer.
NEXT

You might also like