Course Syllabus

Instructor:
eyes robson, eyes.robson@berkeley.edu
Office hours: Mon 3-4p; Thurs 5-6p

Faculty sponsor:
Nilah Ioannidis, nilah@berkeley.edu

Course numbers: (to enroll on CalCentral)
1 hour section: 33440
2 hour section: 33441
(no access code necessary, leave blank!)

Meeting time (zoom): Wednesdays 3:10 to 5:00 pm <link in announcements>
note: lectures will be recorded, although the post-lecture discussion portion may not be. This is a Pass / No Pass course, and attendance will be a small portion of the grade. If you anticipate needing to miss a few classes, feel free to reach out about how to receive makeup credit :)

Anonymous feedback / suggestions: <feedback link TBA>

Course description: AI is eating the world. Unfortunately, these machine learning models have drawn attention in recent years for automating discrimination and being “biased” — a broad term with many distinct manifestations. Within genomics, a field full of complicated ML, these biases lead to dramatically different health outcomes for persons of different socioeconomic status or ancestry. Neither AI nor genomics can be fully “de-biased” in a single semester, but we hope to provide a detailed overview of existing problems and known frameworks to detect and prevent bias through the emerging field of algorithmic fairness.

This student-led discussion group (with multiple anticipated guest speakers!) is designed to equip the next generation of researchers with the awareness and structural competencies to address bias, and is divided into two consecutive parts.

Part 1 : Algorithmic Fairness in AI (approx. weeks 1-6, 8)

Part 2 : Fairness and Model Portability in Genomics (approx. weeks 7, 9-15)

CS-focused enrollees may consider the 1-hour enrollment option, while Genomics or Computational Biology may prefer the complete 2-hour option, time permitting. At the end of the course, we hope participants will be able to answer the following in their own terms:

What is fairness in machine learning, and what are common causes of un fairness?
How can cultural biases impact datasets and modeling assumptions? What kind of approaches can account for this situation?
(2-hour option) What parallels exist between algorithmic fairness and model portability in genomics?
(2-hour option) How can we account for patterns of genetic variation when analyzing genomic data?

Primary Texts : Fair ML book -- https://fairmlbook.org/ , Fatal Invention (Roberts, 2011),
Language (Technology) is Power article

Format: We will have two enrollment options for the course to allow for students with time constraints or primarily interested in fairness to participate:

1 hour option: CS-focused enrollees can prioritize readings for days or guest lectures that best match their interests, and will be asked to complete fewer readings. This section may be ideal for CS students/researchers primarily interested in fairness rather than in genomic fairness, or for those with limited time availability.

2 hour option : attendance expected at most discussions -- suitable for those interested in fairness with respect to genomic and healthcare data. The final week will be dedicated to a discussion group of mini-projects and course reflections from 2 hour attendees (1 hour enrollees encouraged to attend!).

The course will be primarily based around weekly readings, with 55-65 minutes of each discussion dedicated to introducing a topic/research area, and another 35-45 minutes dedicated to active discussion of the papers or presentations of ongoing research. (one 2-hour section per week, with one-unit enrollees occasionally getting out early)

Audience: This course is targeted at upper-level undergraduate and new graduate students (Juniors, Seniors, 1st and 2nd year graduate students). Senior PhD students/researchers may also benefit!

Schedule (subject to change):

Date	Topic / Reading
September 1	Introductions // History of bias at UCB // Defining “fairness” // Covariate shift // Ancestry inference biases slides: https://docs.google.com/presentation/d/1vB3NhUiys4og7-vkMKEpUSl9LKAWF8V-9WAxPvFOZT4/edit?usp=sharing (optional) reading: Why algorithms can be racist and sexist (optional) reading: Whose genomics?
September 8	Allocational vs representational biases in ML // Power hierarchies in NLP and Medicine // Normative Reasoning // Pharmaceutical Data Gaps slides: https://docs.google.com/presentation/d/1RVAwWOM0vSMOAezwtKSlNs_Nsj3-Q-2XH-_n_Jpq7sY/edit?usp=sharing reading response form: https://forms.gle/SJ2KQX9dHXZ6Hz2y7 (due by 3:00 pm on Sep 8!) reading 1: pick one of the two (both will be discussed): choice a) Language (Technology) is Power (NLP centric) choice b) Invisible Women, Ch. 10: The Drugs Don’t Work (see the 'Files' section for PDFs) (optional) reading: Stochastic Parrots (optional) reading: ACL Language Diversity (optional) reading: Two types of harm
September 15	Formal non-discrimination criteria // Metrics // Algorithmic fairness // Intro to Causal Inference / Causal Statistics // slides:https://docs.google.com/presentation/d/1YeahCCEHFQsc931MZOH9fmBZffAd_Jk4c2EGJNtyaiY/edit?usp=sharing reading response form: https://forms.gle/xnfe9eCpka4RzFMo7 (due by 3:00 pm on Sep 15!) reading 2: fairML book, Chapter 2 https://fairmlbook.org/classification.html (optional) reading: Review of Statistical Independence (optional) reading: Correlation vs Causation If the notation this week is unclear or you aren’t fully comfortable with the stats terminology, please consider reaching out to eyes! :)
September 22	Model reporting & moratoria // Facial recognition // Gene therapy, CRISPR, and in vitro fertilization slides: https://docs.google.com/presentation/d/1ZNOXdcoZcMz5T14BbR_3bMKDfO7M4FbPDSTWzwlTS2s/edit?usp=sharing reading response form: https://forms.gle/2jbEwfkpPagaRGk6A reading 3: Model cards for model reporting >> feedback form in the 'Assignments' page << (optional) reading: Gender shades (optional) reading: How one employee's exit shook Google (optional) reading: The death of Jesse Gelsinger, 20 years later
September 29	Confounding and Proxy Variables // The “heritability” of intelligence // Redlining, Miscegeny, Homogamy, and Endogamy // Guest presenter on causal stats reading response form: https://forms.gle/ikMzJt68QzVijGP39 reading 4: Predicting A while hoping for B (also see week 4 in the Files section if you are off campus) slides: https://docs.google.com/presentation/d/1JOdeY29yYFjJPTLGJpqS3Km1DzmAVdN3_A76H5A8K9Y/edit?usp=sharing >> feedback form due 3:00 pm << (optional) reading: Dissecting racial bias in an algorithm used to manage the health of populations
October 6	Extractive vs. co-creative data/participatory research // Surveillance and “Broken windows” policing // Informed Consent // Medical mistrust & Biocolonialism // reading response form: https://forms.gle/PTHYUg6b2qqJfEh7A reading 5: Community-based research if off campus, see 'Files' slides: https://docs.google.com/presentation/d/1lhqThZZEl9kLeVloj6odhtHxb7xwW5daVcbTa_OntFI/edit?usp=sharing (optional) reading: Understanding and ameliorating medical mistrust (optional) reading: Genomic research through an indigenous lens (optional) reading: Genetic Research in Native Communities (optional) reading: 'Broken Windows' Theory (optional) reading: Big-Data Needs a Belmont 2.0
October 13	Physiognomy // Racial science // The medicalization of race // Population partiality // Allele frequency Intro // slides: https://docs.google.com/presentation/d/1vZ6-I_fDnkYEtXkSyd1qN959rvJUyKiItY9e8vU8ixE/edit?usp=sharing oct 13 response form: https://forms.gle/yDehNULptEftwVij9 oct 13 reading: Fatal Invention Chapter 1 (files) full text on UCB libraries (requires sign-in) -- here (optional) reading: Fatal Invention Preface (optional) reading: Physiognomy's New Clothes (optional) reading: Race and Genetics (optional) reading: Fatal Invention Ch. 2
October 20	Racial Medicine // Algorithmic Issues in Healthcare // Federated ML // Privacy and Security of Training Data // oct 20 response: https://forms.gle/1zi2nYm8B3Q3hwUT7 oct 20 reading: https://www.degruyter.com/document/doi/10.1525/9780520961944-051/html (see Week 7 for off-campus) alternative video: https://www.youtube.com/watch?v=KxLMjn4WPBY
October 27	De-racializing Genomics // Populations and Ancestry // Intro to stratification and portability / oct 27 response: https://forms.gle/Yb22T4dNhqetvUo6A oct 27 reading: Fatal Invention Chapter 3 -- see 'Week 8' in the Files section oct 27 slides: https://docs.google.com/presentation/d/12au2E0FBp4VrVAeZFupCN7BrG9wvx8KI9WqwYflIJpo/edit?usp=sharing oct 27 SPEAKER :) -- Alice Popejoy (a new faculty member at UC Davis)
November 3	Population stratification // Minor allele frequencies // GWAS + Pathogenicity prediction // Allocational biases, label, biases, and data gaps in Pathogenicity Prediction // nov 3 slides: https://docs.google.com/presentation/d/1P2h_DJrWT2052wVl3lTbylFAf53h_FVL4vfmIaJ1rxc/edit?usp=sharing nov 3 response: https://forms.gle/JRDj9QxKaV5WmayK6 nov 3 reading: Challenges and Disparities... of Genomic Medicine to Populations with African Ancestry off-campus: See 'Week 9' in the Files section
November 10	GWAS 2 // Label bias // Linkage disequilibrium // nov 10 response: https://forms.gle/JRDe9i2jCWbW7Luy5 nov 10 reading: https://www.nature.com/articles/5201368 (Links to an external site.) (Links to an external site.)off-campus: See 'Week 10' in the Files section
November 17	// Diverse fine-mapping // Admixture and recombination // Linkage // Intro to population-specific effect sizes // nov 17 response: https://forms.gle/GzZ6GNU6T2eBUsDN8 (due Nov 23rd) nov 17 reading: https://permalinks.23andme.com/pdf/23_21-PRSMethodology_May2020.pdf Guest speaker (postponed): nov 17 background: https://permalinks.23andme.com/pdf/23_19-Type2Diabetes_March2019.pdf nov 17 supplement: https://permalinks.23andme.com/pdf/23_21-PRSMethodologyAppendix_May2020.pdf
November 24	no meeting, Thanksgiving break >> if working in groups, please contact eyes byy Nov 23 <<
December 1	Polygenic Risk Scores // Reflections, and Conclusion nov 17 response: https://forms.gle/GzZ6GNU6T2eBUsDN8 (due Nov 23rd) nov 17 reading: https://permalinks.23andme.com/pdf/23_21-PRSMethodology_May2020.pdf final reading 1: https://www.biorxiv.org/content/10.1101/2021.03.18.435971v1.abstract final response 1: https://forms.gle/STJNDCAXiXEWFVR67 final reading 2: https://www.science.org/doi/10.1126/science.aan6877 final response 2: https://forms.gle/iNWRWwGX3yqCMNtY7 Guest speaker: James Ashenhurst, PhD (23andMe) (you might check the Nov 17th reading for preview of James's material) (2 Hour-only) Short report/Presentations
December 8	RRR Week