Course Syllabus

Instructor:
eyes robson, eyes.robson@berkeley.edu
Office hours: Mon 3-4p; Thurs 5-6p

Faculty sponsor:
Nilah Ioannidis, nilah@berkeley.edu


Course numbers: (to enroll on CalCentral)
1 hour section: 33440
2 hour section: 33441
(no access code necessary, leave blank!)

Meeting time (zoom): Wednesdays 3:10 to 5:00 pm <link in announcements>
note:
lectures will be recorded, although the post-lecture discussion portion may not be. This is a Pass / No Pass course, and attendance will be a small portion of the grade. If you anticipate needing to miss a few classes, feel free to reach out about how to receive makeup credit :) 

Anonymous feedback / suggestions: <feedback link TBA>

Course description: AI is eating the world. Unfortunately, these machine learning models have drawn attention in recent years for automating discrimination and being “biased” — a broad term with many distinct manifestations. Within genomics, a field full of complicated ML, these biases lead to dramatically different health outcomes for persons of different socioeconomic status or ancestry. Neither AI nor genomics can be fully “de-biased” in a single semester, but we hope to provide a detailed overview of existing problems and known frameworks to detect and prevent bias through the emerging field of algorithmic fairness. 

This student-led discussion group (with multiple anticipated guest speakers!) is designed to equip the next generation of researchers with the awareness and structural competencies to address bias, and is divided into two consecutive parts.

Part 1 : Algorithmic Fairness in AI (approx. weeks 1-6, 8)

Part 2 : Fairness and Model Portability in Genomics (approx. weeks 7, 9-15)

CS-focused enrollees may consider the 1-hour enrollment option, while Genomics or Computational Biology may prefer the complete 2-hour option, time permitting. At the end of the course, we hope participants will be able to answer the following in their own terms:

  1. What is fairness in machine learning, and what are common causes of un fairness?
  2. How can cultural biases impact datasets and modeling assumptions? What kind of approaches can account for this situation?
  3. (2-hour option) What parallels exist between algorithmic fairness and model portability in genomics?
  4. (2-hour option) How can we account for patterns of genetic variation when analyzing genomic data?

Primary Texts : Fair ML book --  https://fairmlbook.org/ , Fatal Invention (Roberts, 2011),
Language (Technology) is Power article

Format: We will have two enrollment options for the course to allow for students with time constraints or primarily interested in fairness to participate:

1 hour option: CS-focused enrollees can prioritize readings for days or guest lectures that best match their interests, and will be asked to complete fewer readings. This section may be ideal for CS students/researchers primarily interested in fairness rather than in genomic fairness, or for those with limited time availability.

2 hour option : attendance expected at most discussions -- suitable for those interested in fairness with respect to genomic and healthcare data. The final week will be dedicated to a discussion group of mini-projects and course reflections from 2 hour attendees (1 hour enrollees encouraged to attend!). 

The course will be primarily based around weekly readings, with 55-65 minutes of each discussion dedicated to introducing a topic/research area, and another 35-45 minutes dedicated to active discussion of the papers or presentations of ongoing research. (one 2-hour section per week, with one-unit enrollees occasionally getting out early)

Audience: This course is targeted at upper-level undergraduate and new graduate students (Juniors, Seniors, 1st and 2nd year graduate students). Senior PhD students/researchers may also benefit!


Schedule (subject to change):

Date

Topic / Reading

September 1

Introductions // History of bias at UCB // Defining “fairness” // Covariate shift // Ancestry inference biases 

slides: https://docs.google.com/presentation/d/1vB3NhUiys4og7-vkMKEpUSl9LKAWF8V-9WAxPvFOZT4/edit?usp=sharing 

(optional) reading: Why algorithms can be racist and sexist

(optional) reading: Whose genomics?

September 8

Allocational vs representational biases in ML // Power hierarchies in NLP and Medicine // Normative Reasoning // Pharmaceutical Data Gaps

slides: https://docs.google.com/presentation/d/1RVAwWOM0vSMOAezwtKSlNs_Nsj3-Q-2XH-_n_Jpq7sY/edit?usp=sharing 

reading response form: https://forms.gle/SJ2KQX9dHXZ6Hz2y7 
(due by 3:00 pm on Sep 8!)

reading 1: pick one of the two (both will be discussed):
choice a) Language (Technology) is Power (NLP centric)
choice b) Invisible Women, Ch. 10: The Drugs Don’t Work

(see the 'Files' section for PDFs)

(optional) reading: Stochastic Parrots
(optional) reading:
ACL Language Diversity
(optional) reading: Two types of harm

September 15

Formal non-discrimination criteria // Metrics // Algorithmic fairness // Intro to Causal Inference / Causal Statistics //

slides:https://docs.google.com/presentation/d/1YeahCCEHFQsc931MZOH9fmBZffAd_Jk4c2EGJNtyaiY/edit?usp=sharing 

reading response form: https://forms.gle/xnfe9eCpka4RzFMo7 
(due by 3:00 pm on Sep 15!)

reading 2: fairML book, Chapter 2 https://fairmlbook.org/classification.html 

(optional) reading: Review of Statistical Independence 
(optional) reading: Correlation vs Causation

If the notation this week is unclear or you aren’t fully comfortable with the stats terminology, please consider reaching out to eyes! :)

September 22

Model reporting & moratoria // Facial recognition // Gene therapy, CRISPR, and in vitro fertilization

slides: https://docs.google.com/presentation/d/1ZNOXdcoZcMz5T14BbR_3bMKDfO7M4FbPDSTWzwlTS2s/edit?usp=sharing 

reading response form:
https://forms.gle/2jbEwfkpPagaRGk6A 
reading 3
: Model cards for model reporting
 
>> feedback form in the 'Assignments' page <<

(optional) reading: Gender shades

(optional) reading: How one employee's exit shook Google

(optional) reading: The death of Jesse Gelsinger, 20 years later

September 29

Confounding and Proxy Variables // The “heritability” of intelligence // Redlining, Miscegeny, Homogamy, and Endogamy // Guest presenter on causal stats 

reading response form: https://forms.gle/ikMzJt68QzVijGP39 

reading 4: Predicting A while hoping for B 
(also see week 4 in the Files section if you are off campus)

slides: https://docs.google.com/presentation/d/1JOdeY29yYFjJPTLGJpqS3Km1DzmAVdN3_A76H5A8K9Y/edit?usp=sharing 
>> feedback form due 3:00 pm <<

(optional) reading: Dissecting racial bias in an algorithm used to manage the health of populations

October 6

Extractive vs. co-creative data/participatory research // Surveillance and “Broken windows” policing // Informed Consent // Medical mistrust & Biocolonialism //

reading response form: https://forms.gle/PTHYUg6b2qqJfEh7A 

reading 5: Community-based research
if off campus, see 'Files'

slides: https://docs.google.com/presentation/d/1lhqThZZEl9kLeVloj6odhtHxb7xwW5daVcbTa_OntFI/edit?usp=sharing 

(optional) reading: Understanding and ameliorating medical mistrust
(optional) reading: Genomic research through an indigenous lens 
(optional) reading: Genetic Research in Native Communities
(optional) reading: 'Broken Windows' Theory 
(optional) reading: Big-Data Needs a Belmont 2.0

October 13

Physiognomy // Racial science // The medicalization of race // Population partiality // Allele frequency Intro //

slides:
https://docs.google.com/presentation/d/1vZ6-I_fDnkYEtXkSyd1qN959rvJUyKiItY9e8vU8ixE/edit?usp=sharing 
oct 13 response form: https://forms.gle/yDehNULptEftwVij9 
oct 13 reading
: Fatal Invention Chapter 1 (files)
full text on UCB libraries (requires sign-in) -- here

(optional) reading: Fatal Invention Preface
(optional) reading: Physiognomy's New Clothes
(optional) reading: Race and Genetics
(optional) reading: Fatal Invention Ch. 2

October 20

Racial Medicine // Algorithmic Issues in Healthcare // Federated ML // Privacy and Security of Training Data // 

oct 20 response: https://forms.gle/1zi2nYm8B3Q3hwUT7 

oct 20 reading:
https://www.degruyter.com/document/doi/10.1525/9780520961944-051/html 
(see Week 7 for off-campus) 
alternative
video: https://www.youtube.com/watch?v=KxLMjn4WPBY 

October 27

De-racializing Genomics // Populations and Ancestry //  Intro to stratification and portability /

oct 27 response: https://forms.gle/Yb22T4dNhqetvUo6A 
oct 27 reading: Fatal Invention Chapter 3 -- see 'Week 8' in the Files section

oct 27 slides: https://docs.google.com/presentation/d/12au2E0FBp4VrVAeZFupCN7BrG9wvx8KI9WqwYflIJpo/edit?usp=sharing 

oct 27 SPEAKER :)  -- Alice Popejoy (a new faculty member at UC Davis)

November 3

Population stratification // Minor allele frequencies // GWAS + Pathogenicity prediction // Allocational biases, label, biases, and data gaps in Pathogenicity Prediction // 


nov 3 slides: https://docs.google.com/presentation/d/1P2h_DJrWT2052wVl3lTbylFAf53h_FVL4vfmIaJ1rxc/edit?usp=sharing

nov 3 response:
https://forms.gle/JRDj9QxKaV5WmayK6

nov 3 reading: Challenges and Disparities... of Genomic Medicine to Populations with African Ancestry
off-campus: See 'Week 9' in the Files section

November 10

GWAS 2 // Label bias // Linkage disequilibrium //

nov 10 response: https://forms.gle/JRDe9i2jCWbW7Luy5 

nov 10 reading: https://www.nature.com/articles/5201368 (Links to an external site.)

 (Links to an external site.)off-campus: See 'Week 10' in the Files section

November 17

// Diverse fine-mapping // Admixture and recombination // Linkage // Intro to population-specific effect sizes // 

nov 17 response: https://forms.gle/GzZ6GNU6T2eBUsDN8 (due Nov 23rd)
nov 17 reading
: https://permalinks.23andme.com/pdf/23_21-PRSMethodology_May2020.pdf 
Guest speaker (postponed):

nov 17 background: https://permalinks.23andme.com/pdf/23_19-Type2Diabetes_March2019.pdf 
nov 17 supplement: https://permalinks.23andme.com/pdf/23_21-PRSMethodologyAppendix_May2020.pdf 

November 24

no meeting, Thanksgiving break
>> if working in groups, please contact eyes byy Nov 23 <<

December 1

Polygenic Risk Scores // Reflections, and Conclusion

nov 17 response: https://forms.gle/GzZ6GNU6T2eBUsDN8 (due Nov 23rd)
nov 17 reading:
https://permalinks.23andme.com/pdf/23_21-PRSMethodology_May2020.pdf 

final reading 1: https://www.biorxiv.org/content/10.1101/2021.03.18.435971v1.abstract
final response 1: https://forms.gle/STJNDCAXiXEWFVR67  
final reading 2:  https://www.science.org/doi/10.1126/science.aan6877 
final response 2: https://forms.gle/iNWRWwGX3yqCMNtY7 

Guest speaker:  James Ashenhurst, PhD (23andMe) 
(you might check the Nov 17th reading for preview of James's material) 

(2 Hour-only) Short report/Presentations 

December 8

RRR Week