Assignment 4 Description

In this assignment, you will implement a basic version of the two main algorithms in Deep Reinforcement Learning: Deep Q Learning (DQN) and Policy Gradient (PG, REINFORCE). The goals of this assignment are as follows:

  • Implement feed-forward Multi Layer Perceptrons for policies.
  • Implement action-sampling methods: epsilon-greedy and using stochastic policies.
  • Build the loss function for both algorithms.
  • Compute returns and baselines.

All parts of the assignment must be submitted by 11PM on Monday 4/23.

Setup

You can work on the assignment in one of three ways: locally on your own machine, on the EECS instructional machines, or on a virtual machine in EC2 (not recommended for this assignment). See the Assignment 2 Description page for more information on these options.

GPU Resources

GPUs are not required for this assignment. The policies should train in a few minutes in a regular workstation so no significant speedup will be obtained from using GPUs.

 

Working on the assignment

Get the code as a zip file  here.

Set the environment

The detailed instructions can be found in setup_instructions.pdf. Only the first two sections are required. All the remaining of the setup instructions are only for visualization, which is optional.

Start IPython

After you have downloaded the assignment, you should start the Jupyter (IPython) notebook server from the assignment3 directory, with the jupyter notebook command. You are required to complete the notebooks simple_dqn-release.ipynb and simple_pg-pytorch-release.ipynb. Look at the "*** YOUR CODE HERE ***" markers and make sure it compiles and trains properly.

Submitting your work

Whether you work on the assignment locally or on EC2, once you are done working on all parts of the assignment, print the two notebooks as pdfs (including the text outputs) and concatenate the two pdfs (first the dqn one and then the policy gradient one). Please submit this file here Links to an external site. to make a submission on Gradescope (main grading!). 

Also compress the directory as a zip file and upload it here to Bcourses.

 

Assignment Tasks

Q1: DQN (50 points)

The Jupyter notebook simple_dqn-release.ipynb will walk you through implementing the DQN algorithm and training a policy to solve a GridWorld (should take under 5min). You can also use it to train a policy to play the Atari game of Pong (could take up to a few hours, but it is not considered for grading). 

Q2: Policy Gradients (50 points)

The Jupyter notebook simple_pg-release.ipynb will walk you through implementing a simple Policy Gradient algorithm (REINFORCE) with time-dependent baseline. You will train a point-mass agent to reach a goal and can visualize training as it happens (should take under 5min).