Lab 8

Unsupervised Learning

For this lab, we'll use unsupervised learning (k-Means clustering) on an image understanding task. Unsupervised learning tends to be more resource-intensive than supervised learning, so this is a good point to switch to working on Amazon's EC2 compute cloud.

One issue with supervised learning is that it can be difficult to determine how "good" a model is, and therefore difficult to tune the model parameters such as dimension. The k-Means model has a basic fit measure (total squared distance from centroids for k-Means) but this doesnt tell us much. For that reason we'll wrap a k-NN classification task around k-Means, and then we can use prediction error to evaluated various clusterings. That is, we'll cluster some labeled image data into labeled clusters, and then use k-NN on the cluster centroids to compute labels for new data points.

We'll use Scikit-Learn's k-Means clusterer as a performance baseline. We'll also use BIDMach's k-Means clusterer to scale up the model and the dataset size. Since you're new to BIDMach (and the Scala language it uses) we'll first start with a warmup notebook. You can do this before lab, and it doesnt require the EC2 instance (in fact its a waste to use it since the calculations are very short). You already have BIDMach installed in your VM. You can run BIDMach in a notebook which is based on IPython, but is actually IScala.

Warmup

Please try to do this part before lab if you can. Start your laptop VM, and open a terminal window inside. cd to

/opt/BIDMach/tutorials

and then type:

bidmach notebook

(this assumes that there is a link from ~/bin/bidmach to /opt/BIDMach/bidmach. If you dont have one, add it now).

This will bring up a browser with a directory display just like Ipython Notebooks. Select the notebook:

BIDMat_intro.ipynb

and you will be in an IPython-like page. The cells are actually evaluated in Scala/BIDMat however. Scala is an high-level language that compiles to the Java Virtual Machine. It supports line-by-line JIT compilation, which feels like an interpreted language. But its actually compiling everything you type. BIDMat is a collection of classes in Scala that create a Matlab/Numpy-like environment (its much closer to Matlab). You should find BIDMat quite similar to those other languages, modulo a few syntactic differences ( "()" instead of "[]" for collection access and "?" instead of ":" for wildcards). Work through the BIDMat_intro notebook. If you have enough time you can also work through the "Scala_intro" notebook which covers Scala lists and dictionaries.

Documentation

A variety of BIDMat/BIDMach documentation is available from this link.

Once you've completed the warmup, you're ready to move on to EC2:

Starting and Connecting to Your EC2 Instance

Follow this link for directions on how to start/stop and connect to your instance.

Follow this link for a script to connect to your instance from Mac or Linux.

Follow this link for directions on how to setup Putty for connecting from Windows.

Its a good idea to test everything before coming to lab. Its fine to start your instance for that purpose. Just try not to do it multiple times (coordinate with your teammates).

The Main Lab

Comprises two notebooks. The first is a python notebook Links to an external site.. You run this with

ipython notebook

on your EC2 instance

The second is a BIDMach notebook Links to an external site.. You run this with

bidmach notebook

on your ec2 instance