HW4 Spec

Spring 2023
Introduction to Artificial Intelligence

Homework 4: Reinforcement Learning
Due Date: 2023/5/12 23:59
Introduction
Reinforcement learning is an important topic in the field of machine learning. It helps
agents learn the policy to achieve their goal by interacting with the environment, and
has achieved many outstanding performances in computer games. In this
assignment, you will utilize OpenAI Gym environments (i.e., Taxi-v3 and
CartPole-v0) and implement a RL algorithm, Q learning. Your agent will learn a policy
by interacting with the environment. Then, you will analyze the performance and
results and answer some questions.
The goal of this programming assignment is:
1) Implement Q learning in different ways based on the environment and the

requirements
2) Get a taste of a complete reinforcement learning process.
3) Obtain hands-on experience in Python packages for Deep Learning (e.g.,
PyTorch in this assignment)
4) Learn the difference between RL and DRL
Before you start

Please make sure you understand the concept of Q learning and DQN. If you are
not familiar with PyTorch or DQN, please read the tutorial in the reference. In this
assignment, we will use Taxi-v3 and CartPole-v0, which are provided by OpenAI
Gym, as our environments. Please refer to the documentation in the reference we
provide.
Setup
We recommend you to use python 3.7 and all the packages you need are listed in
the requirements.txt. Please run the command to install the packages:
pip install -r requirements.txt
File structure
📂HW4
┣ 📜requirements.txt
┣ 📜plot.py
(for setup)
┣ 📜taxi.py
(for experiment)
┣ 📜cartpole.py
(part1 source code)
┗ 📜DQN.py
(part2 source code)
(part3 source code)
Implementation (50%)
You will implement some key sections of the Q learning algorithm and its variants.
These sections are specified with # Begin your code and # End your code. Please
read all the comments to comprehend the source code before implementation. Note
that do not modify the rest of the code.
Part 1: Q learning in Taxi-v3 (10%)

● Implement the “choose_action” function in taxi.py. The function will choose an
action according to the given state.
● Implement the “learn” function in taxi.py, one of the most essential parts in a
RL algorithm, to decide how an agent updates its Q-table.
● Implement the “check_max_Q” function in taxi.py to calculate the maximum Q
value of the initial state.
● The resulting learning curve should look like this:
Part 2: Q learning in CartPole-v0 (15%)

● In this part, we use a method called “discretization” to enable Q learning in
CarPole-v0.
● Implement the “init_bins” function in cartpole.py to initiate the bins for
discretization.
● Implement the "discretize_value” function in cartpole.py. The function will
discretize the value with given bins.
● Implement the “discretize_observation” function in cartpole.py to discretize the
observation in continuous space into a discrete state.
● Implement the “choose_action” and “learn” functions in cartpole.py. These
functions are similar to those you implemented in part1.
● Implement the “check_max_Q” function in cartpole.py to calculate the
maximum Q value of the initial state.
Part 3: DQN in CartPole-v0 (25%)

● Implement the “choose_action” function in DQN.py. The function will choose
an action according to the given state.
● Implement the “learn” function in DQN.py, one of the most important parts in
an RL algorithm, to decide how an agent updates its networks.
● Implement the “check_max_Q” function in DQN.py to calculate the maximum
Q value of the initial state.
Experiment:
After finishing the implementation, you can run the following command to train your
agent:
python taxi.py / python cartpole.py / python DQN.py
During the training process, a .npy/.pt file of the Q table/network will be generated,
which will be overwritten every single step. After the training, there will be a test to
evaluate your agent, the “test” function will load the .npy/.pt file to test its
performance.
After testing, a reward record of the training process will be saved as a .npy file for
you. You need to use the following command to execute plot.py in order to produce
four graphs (taxi.png, cartpole.png, DQN.png, compare.png) showing your training
processes:
python plot.py [-h] [--taxi] [--cartpole] [--DQN] [--compare]
Report (50%)
● A report is required.
● The report should be written in English.
● Please save the report as a .pdf file. (font size: 12)
● Answer the questions in the report template in detail.
Submission
Please prepare your source code, tables, networks, reward records, graphs, and
report (.pdf) into STUDENTID_hw4.zip.
📂{student_id}_hw4.zip
┣ 📂Plots
┃ ┣📜taxi.png
┃ ┣📜cartpole.png
┃ ┣📜DQN.png
┃ ┗📜compare.png
┣ 📂Rewards
┃ ┣📜taxi_rewards.npy
┃ ┣📜cartpole_rewards.npy
📜
📂
┃ ┗ DQN_rewards.npy
📜
┣ Tables
📜
┃ ┣ taxi_table.npy
📜
┃ ┣ cartpole_table.npy
📜
┃ ┗ DQN.pt
📜
┣ taxi.py
📜
┣ cartpole.py
📜
┣ DQN.py
┗ report.pdf
e.g. 110123456_hw4.zip
Wrong submission format leads to -10 points.
Late Submission Policy

20% off per late day
Reference
1. Taxi-v3 environment 2. CartPole-v0 environment 3. Pytoch DQN tutorial

HW4 Spec

Uploaded by

Copyright:

Available Formats

HW4 Spec

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW4 Spec

Uploaded by

Copyright:

Available Formats

Spring 2023

Introduction to Artificial Intelligence

Due Date: 2023/5/12 23:59

The goal of this programming assignment is:

1) Implement Q learning in different ways based on the environment and the

Before you start

Part 1: Q learning in Taxi-v3 (10%)

Part 2: Q learning in CartPole-v0 (15%)

Part 3: DQN in CartPole-v0 (25%)

python taxi.py / python cartpole.py / python DQN.py

python plot.py [-h] [--taxi] [--cartpole] [--DQN] [--compare]

Wrong submission format leads to -10 points.

Late Submission Policy

1. Taxi-v3 environment 2. CartPole-v0 environment 3. Pytoch DQN tutorial

You might also like