CS 194-16 Introduction to Data Science Fall 2015
Organizations use their data for decision support and to build data-intensive products and services. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then equip the students with the ability to deliver against these expectations. The assignments will involve programming, statistics, and the ability to manipulate data sets with code.
Notes on CS294-16: The graduate version of the class CS294-16 is a "mezannine" class which is part of the Master of Engineering curriculum. Its not normally available to other graduate students except under special circumstances.
Logistics
Course Number: CS 194-16, CS 294-16 Fall 2015, UC Berkeley
Instructor: John Canny
Time: MW 5pm - 6:30pm
Location: 155 Donner Lab through 9/15/2015, then 310 Jacobs Hall
Teaching Assistants: Haoyu Chen
Discussion: Join Piazza
Links to an external site. for announcements and to ask questions about the course
Office hours:
John Canny - M 3-4, W 2-3 at 637 Soda
GSI - Tue 3-4, Fri 3-4 at 283H Soda
Pre-requisites
Pre-requisites for this course include 61A and 61B and basic programming skills. Knowledge of Python will be useful for the assignments, and several will also use the Scala
Links to an external site. language. Students will also be expected to run VirtualBox
Links to an external site. on their laptops for the assignments.
Please take the class survey here .
Please set up your machine according to these instructions . There are many issues with Windows 10 at this time. Dont upgrade if you can avoid it.
Make sure you set up and test your VM before the first lab on 9/2.
Texts
There is no single textbook, and readings will be posted lecture by lecture. However, there are a couple of books that are particularly useful and we will reference them repeatedly:
Grading
Midterm Info
The midterm is next Monday 11/23 from 5-6:30pm. Its closed-book but you can bring a single 8.5x11 sheet with notes on both sides.
Solutions
Homework Solutions: Homework Solutions
Lab Solutions: Lab Solutions
Midterm Fall 2014 and Solutions
Nano-Quizzes for each lecture
Schedule
Mon Lecture
Lecture Topic
Weds Lab
Reading
Assignments (Thursday)
W 8/26Bunny 1 due 8/28
L1: Introduction/Data Science Process [pptx]
Download [pptx]
[pdf]
Download [pdf]
No Lab First lecture instead
Chapter 1
Links to an external site. of Data Science from Scratch
Enterprise Data Analysis and Visualization: An Interview Study
Links to an external site.
none
M 8/31Bunny 2 due 5pm 8/31
L2: Data Collection and Exploration [pptx]
Download [pptx]
[pdf]
Download [pdf]
Lab 1 Unix
CH 9
Links to an external site. of Data Science from Scratch
Sections 7.1-7.2
Links to an external site. and 12
Links to an external site. of Computational Biology 2nd ed. A Practical Intro... by Röbbe Wünschiers
9/3 Homework 1 out. Due by 9/10
M 9/7 no lectureBunny 3 due 9/10
Labor Day Holiday
Lab 2
Links to an external site. Exploratory Data Analysis
CH 3
Links to an external site. and CH 10
Links to an external site. of Data Science from Scratch
9/10 Homework 1 due by 10pm!
Homework 2
Links to an external site. out. Due by 9/17
M 9/14155 Donner Bunny 4 due 9/14
L3: Tabular Data Processing [pptx]
Download [pptx]
[pdf]
Download [pdf]
Lab 3 Pandas
Links to an external site. in 110&120 Jacobs Hall
CH 5
Links to an external site. and CH 7
Links to an external site. of Python for Data Analysis
9/17 Homework 2
Links to an external site. due by 10pm
Project Proposal Out, due 9/25
M 9/21155 Donner Bunny 5 due 9/21
L4: Featurization and statistical tests [pptx]
Download [pptx]
[pdf]
Download [pdf]
Lab 4 Project Planning in 110&120 Jacobs Hall
CH 5
Links to an external site. and CH 7
Links to an external site. of Data Science from Scratch. Please review CH 6
Links to an external site. on probability theory if needed.
9/25 Project Proposal due Midnight
Homework 3
Links to an external site. out, due 10/1
M 9/28310 Jacobs! Bunny 6 due 9/28
L5: Natural Language Processing [pptx]
Download [pptx]
[pdf]
Download [pdf]
Lab 5 Stats and NLP tools
Links to an external site. in 310 Jacobs
Required: Analyzing Sentence Structure
Links to an external site. , CH 8 of the NLTK book (skip section 8.4).
Background: Stanford Dependencies
Links to an external site. , Stanford Parser online docs.
10/1 Homework 3
Links to an external site. Due by 10pm!
Project Data Exploration out, due 10/8
M 10/5Bunny 7 due 10/5
L6: Supervised Learning: kNN, Naive Bayes [pptx]
Download [pptx]
[pdf]
Download [pdf]
Lab 6 Supervised Learning
Links to an external site.
CH 12
Links to an external site. , and CH 13
Links to an external site. of Data Science from Scratch
10/8 Project Data Exploration Due!
Homework 4
Links to an external site. out, due 10/15
M 10/12Bunny 8 due 10/12
L7: Supervised Learning: Linear and Logistic Regression, Trees and Forests pptx
Download pptx
and pdf
Download pdf
Lab 7 Supervised Learning
Links to an external site.
CH 14
Links to an external site. , CH 16
Links to an external site. and CH 17
Links to an external site. of of Data Science from Scratch
10/15 Homework 4
Links to an external site. due
Project Preliminary Data Analysis out, due 10/23
M 10/19Bunny 9 due 10/19
L8: Unsupervised Learning: k-Means, DBSCAN, matrix factorization [pptx]
Download [pptx]
and [pdf]
Download [pdf]
Lab 8 Unsupervised Learning
CH 19
Links to an external site. of Data Science from Scratch, Wikipedia entry on DBSCAN
Links to an external site. , and Tutorial (in Python) on matrix factorization
Links to an external site. .
10/23 Project Preliminary Data Analysis due
Homework 5
Links to an external site. out, due 10/30
M 10/26Bunny 10 due 10/26
L9: Deep Learning for images and text, RNNs. [pdf slides]
Lab 9 CaffeNet and LSTMs
Introduction to Neural Nets
Links to an external site. , chapter 1 from Neural Networks and Deep Learning , Michael Nielsen, Deep Learning and Caffe
Links to an external site. , Evan Shelhammer, LSTM Tutorial
Links to an external site.
10/30 Homework 5
Links to an external site. due
Homework 6 out, due 11/5
M 11/2Bunny 11 due 11/2
L10: Scaling Up Analytics [pptx]
Download [pptx]
[pdf]
Download [pdf]
Lab 10 Spark/EC2
Links to an external site.
"MapReduce," "Word Frequency Problem", and "Other Examples of MapReduce" sections from O'Reilly "Doing Data Science" book (available online
Links to an external site. or from the library) and Spark Short paper
Links to an external site.
11/5 Homework 6 due
Homework 7
Links to an external site. out, due 11/13
M 11/9Bunny 12 due 11/9
L11: Interactive Visualization [pptx]
Download [pptx]
and [pdf]
Download [pdf]
No Lab (Veterans Day)
Chapter 9 on Data Visualization
Links to an external site. from "Doing Data Science" available online or from the library.D3: Data Driven Documents
Links to an external site. by Bostock et. al.
11/13 Homework 7
Links to an external site. due
M 11/16Bunny 13 due 11/16
L12: Graph Processing [pdf]
Download [pdf]
Lab 11 Visualization
Chapter 2
Links to an external site. from "Networks, Crowds, and Markets: Reasoning About a Highly Connected World"
M 11/23
Midterm - 5.00 to 6.30 pmEC2 usage estimate
Non-instructional day- no lab
M 11/30
Project Presentations
Weds: Project Presentations
Weds 12/9
12:30-2:30 Project Posters in 310 Jacobs (some poster examples)
Mon 12/14
Final Project Reports due 10pm (see also the project comments )
Archive Project Report?