Course Syllabus

Course Description:

This course follows 215A, which covered fundamentals of linear regression, exploratory data analysis, and prediction models more generally. This semester, we will focus more on areas where estimation and design are as important as prediction.

Prerequisites: STAT 215A, Linear Algebra. If you have not taken STAT 215A, then you need to have taken a rigorous course in regression which teaches regression concepts using linear algebra and covers ideas of model selection and regularization (like lasso and ridge regression). I will also assume knowledge at a similar level of some other standard EDA methods like PCA. I also assume probability and statistical theory fluency such as would be expected of any graduate-level class in statistics or machine learning.

Weekly Summary (tentative)

Week Topic
1-3 Basic Design and Analysis of Experiments
4-6 Random Effects Models
7-8 Non parametric Regression: Splines and Kernel Regression
9-10 Multiple Testing
11-12 Online Experimentation: Sequential Hypothesis Testing, Bandits, Response Curves & Bayesian Optimization
13-14 Latent variable estimation: Probabilistic PCA, Kernel PCA, Autoencoders

Course Mechanics

Lectures The lecture will be presented in person at the scheduled time.

Sections and Office Hours There is no GSI for this course. However, I will use the scheduled section times so you need to reserve them on your schedule. I will hold the paper discussions during the scheduled sections (see below), and for weeks in which there is no scheduled discussion, I will hold office hours during that time.

I will hold office hours in my office, Evans 433. There is also a Piazza page for asking simpler questions.

Online tools

We will use several different online services in the class that are probably familiar to you:

Resource Materials
Zoom Zoom Link (Links to an external site.)
Meeting ID: 978 4214 8059
Passcode: 462003
(course and office hours are generally in-person; only used when required to be remote)
Bcourses We will post HWs, Solutions, Practice midterms, etc on Bcourses. Announcements, updates, and the like will also be through Bcourses
Piazza You can link to the piazza account via bcourses – clicking on the Piazza link should enroll you automatically in the Piazza class (with your Berkeley email).
Gradescope Assignments will be submitted and graded on gradescope.

Graded Components

There are three main (graded) components to the class

Homework/Lab assignments (50%) These will be rather standard assignments, consisting of theoretical manipulation and analysis of data. There will be roughly 4 over the semester. You are welcome to use either R or python in analyzing the data, but I will mainly be able to support you in R (and the example code I give from my lectures will be in R).

An introductory tutorial I have written for R is available here; it also provides some tips for people who are familiar with python.

Final Report (15%) You will be required to read and write a report on a paper which extends and goes beyond the topics we covered in class. I will provide a list of potential papers, though you may choose another paper if I approve it in advance and it is related to a topic covered in the course.

Discussions (35%) We will have in-class discussions over collections of papers around a theme. The theme will often be loosely connected to the topic we are covering, but may sometimes diverge. I will provide questions in advance that will guide our discussions, and you will be expected to turn in responses a day after the discussion.

Here is a sample of topics (and papers) I have covered in the past, though I will not cover them all and I may introduce new ones.

  • Randomization
    • The Arrangement of Field Experiments by RA Fisher
    • Statistical Problems in Agricultural Experimentation by J. Neyman
  • Adding Covariates in Randomized Experiments
    • Randomization does not justify logistic regression by D. Freedman
  • Philosophies of Probability:
    • What is the chance of an Earthquake? By D. Freedman and P. Stark
    • Why isn’t everyone a Bayesian? By B. Efron
    • Objections to Bayesian Statistics by Andrew Gelman
  • Inference & P-values:
    • Testing Precise Hypotheses by James O. Berger and Mohan Delampady.
    • Scales of Evidence for Model Selection by Bradley Efron, Alan Gous, R Kass, G Datta, and P Lahiri.
  • Empirical Bayes
    • An Empirical Bayes Approach to Statistics by H. Robbins
    • Stein’s Paradox in Statistics by Bradley Efron and Carl Morris.
  • Reproducibility
    • Why most published research findings are false by J. Ioannidis
    • Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology by K. Baggerly and K. Coombs
  • Fairness in AI
    • Big data’s disparate impact, Part I by Solon Barocas and Andrew Selbst.
    • Fairness Through Awareness by Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold and Richard Zemel
    • Fairness-Aware Learning for Continuous Attributes and Treatments by Jeremie Mary, Clement Calauzenes, and Noureddine El Karoui

General Policies

Copyright: Course material, including all video content, is copyrighted and reposting to third party sites or any other form of redistribution is prohibited.

Email: I do not answer detailed questions about course materials through emails. Please bring questions to office hours or Piazza. You may send me messages privately on Piazza, which is my preferred method of communication regarding course issues.

Students with disabilities: If you need accommodations for any disabilities, please make sure you have the DSP office send me the necessary information as soon as possible so that we can make the necessary arrangements in a timely manner.

Academic Integrity:

  • Any homework, test or report submitted by you and that bears your name is presumed to be your own original work.
  • Collaboration around homework should be limited to providing help that will point others to finding the solutions on their own or explaining questions of your peers. But you should not be doing the problems jointly.
  • Obtaining and/or using solutions from previous years or from the internet, if such happen to be available, is considered cheating.
  • You cannot post nor solicit solutions on Piazza. Please constrain your questions/solutions around guidence that will help others in finding the way to solutions.
  • Any instances of cheating will be reported to the Center for Student Conduct and the entire assignment will receive a zero.

Course Summary:

Date Details Due