Lab 8
Unsupervised Learning
For this lab, we'll use unsupervised learning (k-Means clustering) on an image understanding task. Unsupervised learning tends to be more resource-intensive than supervised learning, so this is a good point to switch to working on Amazon's EC2 compute cloud.
One issue with supervised learning is that it can be difficult to determine how "good" a model is, and therefore difficult to tune the model parameters such as dimension. The k-Means model has a basic fit measure (total squared distance from centroids for k-Means) but this doesnt tell us much. For that reason we'll wrap a k-NN classification task around k-Means, and then we can use prediction error to evaluated various clusterings. That is, we'll cluster some labeled image data into labeled clusters, and then use k-NN on the cluster centroids to compute labels for new data points.
We'll use Scikit-Learn's k-Means clusterer as a performance baseline. We'll also use BIDMach's k-Means clusterer to scale up the model and the dataset size. Since you're new to BIDMach (and the Scala language it uses) we'll first start with a warmup notebook. You can do this before lab, and it doesnt require the EC2 instance (in fact its a waste to use it since the calculations are very short). You already have BIDMach installed in your VM. You can run BIDMach in a notebook which is based on IPython, but is actually IScala.
Warmup
Please try to do this part before lab if you can. Start your laptop VM, and open a terminal window inside. cd to
/opt/BIDMach/tutorials
and then type:
bidmach notebook
(this assumes that there is a link from ~/bin/bidmach to /opt/BIDMach/bidmach. If you dont have one, add it now).
This will bring up a browser with a directory display just like Ipython Notebooks. Select the notebook:
BIDMat_intro.ipynb
and you will be in an IPython-like page. The cells are actually evaluated in Scala/BIDMat however. Scala is an high-level language that compiles to the Java Virtual Machine. It supports line-by-line JIT compilation, which feels like an interpreted language. But its actually compiling everything you type. BIDMat is a collection of classes in Scala that create a Matlab/Numpy-like environment (its much closer to Matlab). You should find BIDMat quite similar to those other languages, modulo a few syntactic differences ( "()" instead of "[]" for collection access and "?" instead of ":" for wildcards). Work through the BIDMat_intro notebook. If you have enough time you can also work through the "Scala_intro" notebook which covers Scala lists and dictionaries.
Documentation
A variety of BIDMat/BIDMach documentation is available from this link.
Once you've completed the warmup, you're ready to move on to EC2:
Starting and Connecting to Your EC2 Instance
Follow this link for directions on how to start/stop and connect to your instance.
Follow this link for a script to connect to your instance from Mac or Linux.
Follow this link for directions on how to setup Putty for connecting from Windows.
Its a good idea to test everything before coming to lab. Its fine to start your instance for that purpose. Just try not to do it multiple times (coordinate with your teammates).
The Main Lab
Comprises two notebooks. The first is a python notebook
Links to an external site.. You run this with
ipython notebook
on your EC2 instance
The second is a BIDMach notebook
Links to an external site.. You run this with
bidmach notebook
on your ec2 instance