setup
Setting Up Your Environment
In this class, we will all be using the same Virtual Machines to complete assignments and class exercises. We have configured a Virtual Machine Image with a recent version of Linux, Python 2.7, and several libraries we’ll be using throughout the class.
Note: While you should also be able to set up a similar environment on your machine without needing a Virtual Machine, the course staff will not support such configurations - so you’re on your own if you choose to go that route!
To create a VM using our disk image:
- Download an install VirtualBox Links to an external site..
- Download and unzip our VirtualBox disk image from here (zip file) or here Links to an external site. (tgz file; try for OS X).
- Open VirtualBox and click the 'New' button.
- Select the following options in the VM creation wizard that appears:
-
Name and operating system
- Type: Linux
- Version: Ubuntu (64-bit)
- Memory size: at least 1024 MB
-
Hard drive
- Use an existing virtual drive file; select the disk image (
.vdi
file) you unzipped
- Use an existing virtual drive file; select the disk image (
-
Name and operating system
Then, start the VM by selecting it and pressing the 'Start' button at the top of the VitrualBox VM Manager window. After you start the VM, you should be able to open a terminal by clicking the icon with a black rectangle on the left-hand side of the VM's screen.
To test that your machine is set up properly, run the following from a terminal window :
ipython notebook
-
In the browser window that pops up, create a new notebook, and enter the following in the first cell:
%pylab inline x = np.random.randn(5000) plt.hist(x, 50)
(Note: you can share copy-and-paste between your host OS and your VM if you select an option under "Devices -> Shared Clipboard" from the VirtualBox menu while running your VM.) - Hit the ‘Play’ button on the toolbar.
- You should end up with something like this:
- Close firefox, hit Control-C and Y to exit.
Credentials
The VM should automatically log you into an account called datascience. The password for this account is datascience. You can run commands with root (administrator) privileges using sudo the-command
and entering this password.
Information you should not need
No 64-bit support
In the unlikely event that your machine cannot support a 64-bit virtual machine, we also have a 32-bit image. Configuration instructions are the same, except that you should select Version: Ubuntu (32-bit) instead of Version: Ubuntu (64-bit).
We believe that machines that cannot support a 64-bit VM are extremely rare. Almost all recent processors support 64-bit virtualization, and Virtual Box can run 64-bit VMs even on machines with 32-bit OSes. The only glitch although some PCs and laptops may have it disabled in the BIOS.
Please inform the course staff if you need this. In particular, we may need to build a 32-bit version of software for later in the course to support your configuration.
Software installed in the VM
The virtual machine image we supplied is setup using this script Links to an external site., which should also work on a Ubuntu-like system (but this is not supported by the course staff, so use it at your own risk). The notable pieces of software this installs are:
- ipython Links to an external site.
- pandas Links to an external site.
- OpenRefine Links to an external site.
- Spark Links to an external site.
- R Links to an external site.
- numpy Links to an external site.
- scipy Links to an external site.
- matplotlib Links to an external site.
- scikit-learn Links to an external site.
- python-levenshtien Links to an external site.
- graphviz Links to an external site. and pydot Links to an external site.
- BIDMach
- gawk Links to an external site.
- IScala Links to an external site.