Course Syllabus
Instructor:
eyes robson, eyes.robson@berkeley.edu
Office hours: Mon 3-4p; Thurs 5-6p
Faculty sponsor:
Nilah Ioannidis, nilah@berkeley.edu
Course numbers: (to enroll on CalCentral)
1 hour section: 33440
2 hour section: 33441
(no access code necessary, leave blank!)
Meeting time (zoom): Wednesdays 3:10 to 5:00 pm <link in announcements>
note: lectures will be recorded, although the post-lecture discussion portion may not be. This is a Pass / No Pass course, and attendance will be a small portion of the grade. If you anticipate needing to miss a few classes, feel free to reach out about how to receive makeup credit :)
Anonymous feedback / suggestions: <feedback link TBA>
Course description: AI is eating the world. Unfortunately, these machine learning models have drawn attention in recent years for automating discrimination and being “biased” — a broad term with many distinct manifestations. Within genomics, a field full of complicated ML, these biases lead to dramatically different health outcomes for persons of different socioeconomic status or ancestry. Neither AI nor genomics can be fully “de-biased” in a single semester, but we hope to provide a detailed overview of existing problems and known frameworks to detect and prevent bias through the emerging field of algorithmic fairness.
This student-led discussion group (with multiple anticipated guest speakers!) is designed to equip the next generation of researchers with the awareness and structural competencies to address bias, and is divided into two consecutive parts.
Part 1 : Algorithmic Fairness in AI (approx. weeks 1-6, 8)
Part 2 : Fairness and Model Portability in Genomics (approx. weeks 7, 9-15)
CS-focused enrollees may consider the 1-hour enrollment option, while Genomics or Computational Biology may prefer the complete 2-hour option, time permitting. At the end of the course, we hope participants will be able to answer the following in their own terms:
- What is fairness in machine learning, and what are common causes of un fairness?
- How can cultural biases impact datasets and modeling assumptions? What kind of approaches can account for this situation?
- (2-hour option) What parallels exist between algorithmic fairness and model portability in genomics?
- (2-hour option) How can we account for patterns of genetic variation when analyzing genomic data?
Primary Texts : Fair ML book -- https://fairmlbook.org/ , Fatal Invention (Roberts, 2011),
Language (Technology) is Power article
Format: We will have two enrollment options for the course to allow for students with time constraints or primarily interested in fairness to participate:
1 hour option: CS-focused enrollees can prioritize readings for days or guest lectures that best match their interests, and will be asked to complete fewer readings. This section may be ideal for CS students/researchers primarily interested in fairness rather than in genomic fairness, or for those with limited time availability.
2 hour option : attendance expected at most discussions -- suitable for those interested in fairness with respect to genomic and healthcare data. The final week will be dedicated to a discussion group of mini-projects and course reflections from 2 hour attendees (1 hour enrollees encouraged to attend!).
The course will be primarily based around weekly readings, with 55-65 minutes of each discussion dedicated to introducing a topic/research area, and another 35-45 minutes dedicated to active discussion of the papers or presentations of ongoing research. (one 2-hour section per week, with one-unit enrollees occasionally getting out early)
Audience: This course is targeted at upper-level undergraduate and new graduate students (Juniors, Seniors, 1st and 2nd year graduate students). Senior PhD students/researchers may also benefit!
Schedule (subject to change):
Date |
Topic / Reading |
September 1 |
Introductions // History of bias at UCB // Defining “fairness” // Covariate shift // Ancestry inference biases (optional) reading: Why algorithms can be racist and sexist (optional) reading: Whose genomics? |
September 8 |
Allocational vs representational biases in ML // Power hierarchies in NLP and Medicine // Normative Reasoning // Pharmaceutical Data Gaps slides: https://docs.google.com/presentation/d/1RVAwWOM0vSMOAezwtKSlNs_Nsj3-Q-2XH-_n_Jpq7sY/edit?usp=sharing (see the 'Files' section for PDFs) (optional) reading: Stochastic Parrots |
September 15 |
Formal non-discrimination criteria // Metrics // Algorithmic fairness // Intro to Causal Inference / Causal Statistics // reading response form: https://forms.gle/xnfe9eCpka4RzFMo7 (due by 3:00 pm on Sep 15!) reading 2: fairML book, Chapter 2 https://fairmlbook.org/classification.html (optional) reading: Review of Statistical Independence If the notation this week is unclear or you aren’t fully comfortable with the stats terminology, please consider reaching out to eyes! :) |
September 22 |
Model reporting & moratoria // Facial recognition // Gene therapy, CRISPR, and in vitro fertilization slides: https://docs.google.com/presentation/d/1ZNOXdcoZcMz5T14BbR_3bMKDfO7M4FbPDSTWzwlTS2s/edit?usp=sharing (optional) reading: Gender shades (optional) reading: How one employee's exit shook Google (optional) reading: The death of Jesse Gelsinger, 20 years later |
September 29 |
Confounding and Proxy Variables // The “heritability” of intelligence // Redlining, Miscegeny, Homogamy, and Endogamy // Guest presenter on causal stats reading 4: Predicting A while hoping for B |
October 6 |
Extractive vs. co-creative data/participatory research // Surveillance and “Broken windows” policing // Informed Consent // Medical mistrust & Biocolonialism // (optional) reading: Understanding and ameliorating medical mistrust |
October 13 |
Physiognomy // Racial science // The medicalization of race // Population partiality // Allele frequency Intro // |
October 20 |
Racial Medicine // Algorithmic Issues in Healthcare // Federated ML // Privacy and Security of Training Data // oct 20 response: https://forms.gle/1zi2nYm8B3Q3hwUT7 |
October 27 |
De-racializing Genomics // Populations and Ancestry // Intro to stratification and portability / oct 27 response: https://forms.gle/Yb22T4dNhqetvUo6A |
November 3 |
Population stratification // Minor allele frequencies // GWAS + Pathogenicity prediction // Allocational biases, label, biases, and data gaps in Pathogenicity Prediction //
nov 3 reading: Challenges and Disparities... of Genomic Medicine to Populations with African Ancestry |
November 10 |
GWAS 2 // Label bias // Linkage disequilibrium // nov 10 reading: https://www.nature.com/articles/5201368 (Links to an external site.) (Links to an external site.)off-campus: See 'Week 10' in the Files section |
November 17 |
// Diverse fine-mapping // Admixture and recombination // Linkage // Intro to population-specific effect sizes // nov 17 response: https://forms.gle/GzZ6GNU6T2eBUsDN8 (due Nov 23rd) |
November 24 |
no meeting, Thanksgiving break |
December 1 |
Polygenic Risk Scores // Reflections, and Conclusion |
December 8 |
RRR Week |