Assignment 4 Description

In this assignment you will implement two Reinforcement Learning algorithms: Deep Q-Learning and Policy Gradients, and apply them to an OpenAI gym environment.

The goals of this assignment are as follows:

  • Understand the structure of model-based reinforcement learning algorithms
  • Compare policy-based and value-based methods for reinforcement learning

Setup

Get the code as a zip file here Download here. As for the dependencies:

[Option 1] Use Anaconda: The preferred approach for installing all the assignment dependencies is to use Anaconda Links to an external site., which is a Python distribution that includes many of the most popular Python packages for science, math, engineering and data analysis. Once you install it you can skip all mentions of requirements and you are ready to go directly to working on the assignment.

[Option 2] Manual install, virtual environment: If you do not want to use Anaconda and want to go with a more manual and risky installation route you will likely want to create a virtual environment Links to an external site. for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine.

Install OpenAI Gym: Directions for installing the OpenAI gym are given here:
https://github.com/openai/gym#installationThe gym system is pre-built for OSX and Linux. For Windows you will need to use either the Docker container provided, or another VM machine such as VirtualBox.

The code for this assignment requires numpy only, there are no additional sources to compile.

Start IPython: Start the IPython notebook server from the assignment4 directory. If you are unfamiliar with IPython, you should read our IPython tutorial Links to an external site..

Q1: Reinforcement Learning via Deep Q-Learning (40 points)

The IPython notebook DQN.ipynb will walk you through the implementation of Deep Q-Learning running on a simple simulated world (CartPole).

Q2: Reinforcement Learning via Policy Gradients (40 points)

The IPython notebook PG.ipynb will walk you through the implementation vanilla Policy Gradients, running on the same simulated world.

Q3: Do something extra! (up to +20 points)

Try a different environment, or a fancier front end - the estimators for our RL models were simple FC networks but complex environments will require CNN estimators like the original DQN paper. There are many improvements that can be made to the models - try e.g. multiple-step DQN or Trust Region Policy Gradients.

You can check your submission on a standard Python implementation in Virtual Box. First download and install virtual box from here (Links to an external site.). Then grab this zip file, save and unzip it.

  • Open VirtualBox and click the 'New' button.
  • Select the following options in the VM creation wizard that appears:
    • Name and operating system
      • Type: Linux
      • Version: Ubuntu (64-bit)
    • Memory size: at least 1024 MB, preferably half the physical memory on your machine.
    • Hard drive
      • Use an existing virtual drive file; select the disk image (.vdi file) you unzipped
    • CPU cores
      • Under Settings→System→Processor
      • allocate half the machine's cores to the virtual machine

Then you can start your virtual machine, and test the assignment inside it. You can mount directories from your host machine or use the network to copy the assignment into the VM. The account name is "deep" and has password "deep". The account has sudo access. Sorry, only Python 2.7 support for now.

Submitting your work:

Once you are done working run the collectSubmission.sh script; this will produce a file called assignment4.zip. Submit this file at the end of this assignment.