Moral Questions of Data Science (Spring 2019)

Moral Questions of Data Science (Spring 2019)

TENTATIVE --- SUBJECT TO CHANGE!!!

 

Philosophy 121: Moral Questions of Data Science

TuTh 8–9:30am

Location: Barrows 56

 

Instructor:

Niko Kolodny, kolodny@berkeley.edu

For office hours, see: http://sophos.berkeley.edu/kolodny

 

Graduate Student Instructors:

Rachel Rudolph, rachelrudolph@berkeley.edu

Office hours: Wed 12-1:30, 301 Moses Hall (or by appointment)

Google doc with students' discussion questions: https://docs.google.com/document/d/1qH8G3iAmqKLeNSEmnmyiU169bzLQYGnW8ZMuSwgw-lU/edit?usp=sharing

 

Peer tutoring program:

http://philosophy.berkeley.edu/undergraduate/faqs#33

 

Catalog Description:

This course explores, from a philosophical perspective, ethical questions arising from collecting, drawing inferences from, and acting on data, especially when these activities are automated and on a large scale. Topics include: bias, fairness, discrimination, interpretability, privacy, paternalism, freedom of speech, and democracy. Three hours of lecture and one hour of discussion per week.

 

Prerequisites:

Prior coursework in philosophy will be helpful, but is not required.

 

Readings:

  • There is one book to buy: Cathy O'Neil, Weapons of Math Destruction.
  • All other readings are available from links on the online syllabus.  PDF links, however, are restricted to enrolled students, who have access to UC Berkeley libraries.

 

Requirements:

1. Read all assigned texts at least twice.

2. Attendance and participation at lecture.

  • After three unexcused absences from lecture, each additional, unexcused absence will lower the course GPA by 0.1 points.
  • All devices must be put away, with ringers turned off. 
  • All discussion in this class must be conducted in a collegial and respectful manner. This includes discussions in class, in section, and online. Participation in a discussion can take many forms — you can add support to an idea, clarify it, distinguish it from related ideas, or disagree with it and challenge it. But however you choose to participate, the discussion must be always be conducted with respect and civility. This is not always easy. There may be times when someone expresses an idea that strikes you as immoral, repugnant, or deeply offensive. If you wish to challenge the idea, be sure to target the idea itself rather than the person who expressed it. It is never appropriate to demean or denigrate fellow students and instructors.

3. Attendance and participation in section: 10%. 

4. First paper of 1-2 pages on utilitarianism: 10%

5. Second paper of 1-2 pages on O’Neil, Weapons: 10%

6. Third paper of 3-4 pages: 20%

7. Fourth paper of 3-4 pages: 20%

Optional: You may rewrite any one of these papers. The grade for the original will still count for 5%, but the grade for the rewrite will count for the rest.

8. Take-home final of 2-4 pages: 30%. You will have to answer FOUR questions out of SIX. Some answers will be longer than others, but they should average to between half a page and one page.  All of the questions will be review questions from the handouts.  You will have 48 hours to complete the final.  It will be assigned and submitted on bCourses.  It will be available from 8am May 15.

Optional: If you want to pursue a particular topic in greater depth, and can make a case for the relevance and interest of the topic, then, instead of the take-home final, you may write a term paper on the topic of 6-8 pages. A half-page description of the topic, along with a one-page outline will be due on Apr. 11. It must be approved by the GSI before you can proceed. A first draft will be due on Apr. 25. A final draft will be due on May 17.

 

Handouts:

Publicly available in this folder:

http://sophos.berkeley.edu/kolodny/19SPhilos121Handouts

Also available in Course Files > Handouts:

https://bcourses.berkeley.edu/courses/1477033/files/folder/Handouts

What aims should omnipotent artificial intelligence have?  An introduction to moral philosophy

Syllabus:

Jan. 22

The aim of this course is to apply moral philosophy to the ethical problems raised by collecting, inferring from, and acting on data, especially when these activities are automated and at a large scale. By “applying moral philosophy,” we mean not only applying the doctrines that particular moral philosophers have espoused, but also, and more importantly, applying the methods of moral philosophy.

Perhaps the principal (and arguably the only) method starts with our clearer and more confident moral judgments about real or hypothetical particular cases: e.g., that a particular criminal sentence was unjust. It then works to formulate more basic and general values or principles that would justify those particular judgments and that, once articulated, strike us as genuinely important. If we’re lucky, the more basic and general values that we formulate then help us to arrive at settled judgments on less certain particular cases. But more often we find conflicts. Perhaps we can formulate values or principles that make sense of some of our particular judgments, but not others. Or perhaps the formulations, while capturing our particular judgments, don’t seem, on reflection, to be tracking anything important.

When it comes to real-world, or soon-to-be real-world, practices of data analysis, moral judgments about particular cases are left, right, and center. You don’t even need visit a site that specializes in tech, such as Wired. The New York Times or Wall Street Journal will do. Before long you’ll find some editorial or investigative piece decrying, explicitly or implicitly, the problematic—unjust, undemocratic, illiberal—character of this or that use of data analytics.

If such writing sounds the alarm, our task will be to evaluate whether, if at all, we should be alarmed, why, if at all, we should be alarmed, and which alternatives, if any, to the practices in question would be improvements.

To acquaint yourself with this sort of commentary and reportage, you’ll start by reading Cathy O’Neil’s Weapons of Math Destruction. While we won’t discuss her book much in lecture, you will be asked to write a short paper that engages with it.

During this initial part of the course, our main aim in lecture will be instead to explore “utilitarianism.” This is both to get a feel for the methods of moral philosophy and to acquaint ourselves with perhaps the most influential moral philosophy of the past few centuries. Utilitarianism says, very simply, that the right thing to do in any situation is whatever would maximize pleasure and minimize pain.

To make things vivid, some think it’s likely that by the end of the century, there will be artificial intelligence with the power to determine the future of humanity. We may get only one chance to influence what it aims to do. Should we try to get it to aim to maximize pleasure? Or, on a smaller scale, do we want our self-driving car or Alexa to maximize pleasure?

Jan. 24

Utilitarianism has three main components: hedonism, aggregation, and consequentialism. Hedonism says that your life goes better for you just insofar as it has more pleasure and less pain. Is that right? Or does it matter whether you get what you want? Or whether you have meaningful relationships and worthwhile achievements? Or whether you direct your life according to your own free and informed choices?

Jan. 29

Aggregation says that one outcome is better than another just when the sum total of what’s good for people (e.g., pleasure) is greater. Why then should we punish anyone for crimes? Why should we punish only people who have committed crimes? And why not persecute a minority, if this will give pleasure to a much larger majority?

Jan. 31

Feb. 5: First and second papers assigned

The last component of utilitarianism, consequentialism, says that the right action produces the best outcome. But can the right action sometimes have a worse outcome? It’s certainly a better outcome if only one innocent person dies rather than five innocent people. But are we allowed to kill one innocent person even to save the lives of five other innocent people? 

Feb. 7

When is a use of data biased?  When is it fair?  Why does it matter?

With some practice thinking as moral philosophers, we now turn to specific ethical challenges posed by data analytics. If you came away from O’Neil’s Weapons with nothing else, you came away with the points, first, that a variety of algorithms look for patterns in data about us, and, second, that knowledge of these patterns is then used in a variety of ways, often with great significance for our lives.

In our next few lectures, we discuss some specific, real-world examples, including the following: (i) patterns in where certain crimes have been seen to occur by police are then used to decide where best to deploy police to deter or catch future crimes, (ii) patterns in personal characteristics are used to predict whether someone is likely to commit other crimes if released on bail while awaiting trial, (iii) patterns in a given set of facial images are used to train an algorithm to recognize whose face caused the other images. In each example, as we will see, a case can be made that the resulting algorithm is in some sense biased, in a way that unfairly disadvantages racial minorities.

Yet the algorithms themselves, it seems, don’t have racial biases, not even the implicit biases that perhaps all of us have, to some degree. So what’s going on? It’s not always easy to say. But in some cases, it seems, the “data itself” may be biased, because the data reflects the biased judgments of people, or because it collects data on some groups more intensively than others, or because a random sample will almost certainly include fewer data on members of minorities, simply because minorities are, by definition, smaller groups.

What, then, would an “unbiased” or “fair” algorithm look like? In many cases, it’s not obvious, which may lead us to question our initial judgment of bias or unfairness.  

The designers of the bail algorithm targeted in the ProPublica exposé, for example, offer a reasonable defense. Indeed, Kleinberg, et al. prove that, apart from uninteresting, degenerate cases, no algorithm can jointly satisfy three criteria of fairness, each of which seems perfectly reasonable on its own. In other words, they prove that no matter what the algorithm had been, ProPublica could have written some exposé.

It would seem that an algorithm treats people unfairly only if it disadvantages them.  But what counts as a “disadvantage”? Whereas Buolamwini claims, not implausibly, that algorithms that can’t detect people with darker skin disadvantage them, Hassein suggests, also not implausibly, that such algorithms may mitigate their disadvantage.

Feb. 12

Feb. 14: First papers due

Feb. 19

To make sense of our initial reaction that some algorithms are “unfair,” we are thus led to one of the most fundamental questions of moral philosophy. When is a decision about who gets what—e.g. a job, or a spot at Berkeley—fair? Why should we care about fairness, in that sense?

Recall that utilitarianism doesn’t care about fairness (except insofar as fairness maximizes pleasure). Rawls’s theory of “justice as fairness,” which he deliberately designed as an alternative to utilitarianism, presents a more complex view. When it comes to distributing economic goods, a just society may permit inequalities, but only so long as those inequalities work to the advantage of the worst-off group in society. At the same time, society must also give everyone equal opportunity to develop their talents and then place people in careers solely on the basis of their developed talents.

What would fair algorithms look like, on this conception of fairness? What sort of data would they be sensitive to, and in what way?

Feb. 21

Feb. 26

  • T. M. Scanlon, Why Does Inequality Matter? Ch. 45

Charges of discrimination and unfairness are often made in the same breath. But not every kind of unfairness is discrimination, and not every case of discrimination is unfair, at least apart from its being discrimination. So what is discrimination, and why is it wrong? Hellman argues that discrimination is wrong because of what it expresses. It’s not just that, fairly or unfairly, someone doesn’t get a job, say, but also that the way in which they are denied the job communicates that simply because of their race, gender, or religion, they are somehow inferior.

If discrimination consists in communicating judgments of inferiority, however, then algorithms discriminate only if they communicate such judgments. Do algorithms communicate such judgments? Wrestling with related questions, Barocas and Selbst ask how far the categories of discrimination recognized by U.S. law apply to algorithms.

Feb. 28: Second paper due

Mar. 5

Do you have a right to be treated as an individual and not merely a statistic?  Why does it matter?

It seems wrong to base certain kinds of decisions purely on “statistical” evidence, as opposed to “direct” or “individualized” evidence. Suppose that eyewitness testimony, a form of direct evidence, is 90% accurate.   Suppose that parolees are 91% likely to commit a crime in the first year of parole, a form of statistical evidence. If we can convict someone simply on the basis of eyewitness testimony, can we also convict someone simply on the basis of being a parolee? It seems not. Yet over the long run, we will get it right more often if we rely on statistical evidence rather than direct evidence!

So what’s the problem with relying on statistical evidence? Moreover, in other contexts, it seems wrong to use individualized evidence rather than statistical evidence. For example, it’s OK for your insurer to raise your health insurance premium next year if it finds that it underestimated the rate of breast cancer in the population. But is it OK for it to raise your premium because it discovered that you in particular have a genetic predisposition to breast cancer?

As Eidelson observes, the objection to discrimination is sometimes said to be that it does not treat one as an individual, but instead as a member of a group.

In sum, do we have a right to be treated as an individual, rather than a member of a group? Do we have a right not to be treated as an individual? When and why?

Mar. 7: Third paper assigned

  • Fred Schauer, Profiles, Probabilities, and Stereotypes, Intro., Ch. 3

[For anyone interested in further reading on the topic: Judith Thomson, "Liability and Individualized Evidence"; David T. Wasserman, "The Morality of Statistical Proof and the Risk of Mistaken Liability"; David Enoch, Levi Spectre and Talia Fisher, "Statistical Evidence, Sensitivity, and the Legal Value of Knowledge"]

Mar. 12

  • No new reading.  Instead, look over the review questions on the handouts so far.  Make a note of any that you find particularly challenging.  We'll discuss some together in class.

[For anyone interested in further reading, Fred Schauer, Profiles, Probabilities, and Stereotypes, Ch. 5, 7]

Mar. 14

Do algorithms owe us an explanation?  Does it matter whether we understand why they do what they do?

In general, it’s a problem if an algorithm isn’t accurate: for example, if it delivers lots of false positives and lots of false negatives. And, as we have seen, it is a problem if an algorithm is unfair or discriminatory. But do you have a distinct complaint if you were affected by an algorithmic decision (e.g., denied a loan) but did not receive an explanation of why the algorithm decided your case as it did? Note that it’s not exactly a question of someone hiding from you the reasons for a decision. Increasingly, no one may be able to explain why the algorithm decided what it did, not even its designers.

Of course, an explanation may be evidence that an algorithm was or was not accurate or fair. But does it matter independently? We often don’t expect an explanation of certain decisions, even when they are made by human beings and even when they are very consequential. For example, someone convicted of a crime has no right to a jury’s explaining why it decided what it did. Or when a ballot proposition is passed banning same-sex marriage, same-sex couples have no right to the majority’s explaining its reasoning. Moreover, creatures of rationalization that we are, the grounds that we, human beings, offer for our decisions, even when perfectly sincere, may not be the “real” explanation of why we did what we did.

So when, if ever, then, do we have a right to explanation, and why?

Mar. 19

When is a use of data an invasion of privacy?  Why does it matter?

Big data seems to threaten privacy. In part, it is because so much data about you is collected. But, in part, as Kosinski, et al. show, it is because so much data about other people has been collected, that information about you can be inferred from even relatively few data about you.  Your IQ, for example, can be predicted by whether or not you like "curly fries" on Facebook.

What is the privacy that is thus threatened? It seems to be a kind of control over “your personal information.” But what makes information “yours” or “personal” in the relevant sense? Why care about having this kind of control? One reason is to protect yourself from harm, as you might by locking your door or wearing a helmet. If online bullies can’t get your docs, then they can’t “dox” you, and so expose you to harassment. Another reason, which Thomson suggests may be the whole story, is simply control over how other people use your body or property. But Marmor suggests another, more distinctive reason: an interest in controlling how you present yourself to others. 

If a right to privacy is a right to control what others do with your information, then that right, presumably, is not violated if someone does something with your information with your consent. But when have you given consent? If I say, “Unless you jump straight up in the air ten feet, then you have consented to [here I whisper something that you can’t hear]” you haven’t consented. First, you couldn’t do otherwise, assuming you can’t jump ten feet. And, second, you have no idea what whispered thing you were consenting to. Roughly, consent must be free and informed.  Just as Hume famously argued that mere residence in a country couldn't be interpreted as consent to obey its laws, one might wonder whether clicking "agree" on privacy agreements counts as consent to accept their terms, especially given the costs of reading such agreements, calculated by McDonald and Cranor.  For perhaps related reasons, Nissenbaum is skeptical that informed consent is the right model for privacy protections.

Under what sort of conditions would your right to privacy be adequately respected?  Is the European GDPR, which took effect in 2018, a step in the right direction?  Should there be a "right to be forgotten": to have Google not respond to searches for your name with links to embarrassing information from your past?

Mar. 21: Third paper due

    Apr. 2

    Apr. 4: Fourth paper assigned

    Apr. 9

    When do algorithms leave us free to choose?

    Much of the data analyzed about you is data about what you seem to want (e.g., which posts you like, which videos you watch, which things you buy). And the point of analyzing it is to give you more of what you seem to want. What could be wrong with that?

    However, there is a genuine question about whether it’s good for you to get what you want: to put it paradoxically, whether you ought to want to get what you want. According to hedonism, recall, what is good for you to get is what gives you pleasure, which may or may not be what you want, at least in the sense in which the algorithm is concerned: what you are likely to click on. Perhaps you will click on things that make you feel bad, as Shakya and Christakis suggest. And, as we saw, there are other views about what makes for a good life, besides experiencing pleasure and getting what you want, such as having meaningful connections with other people.

    But, it might be said, isn’t it “paternalistic” to use algorithms that give you not what you want, but instead what is good for you? Wouldn’t that be “imposing values” on you? When and why is that objectionable? We look at Mill’s classic answer, as well as a more recent answer from Scanlon.

    Is there any alternative to “imposing values” on you? In many contexts, as Harris and Sunstein and Thaler observe, there may be no “neutral” way to present people with options.

    Apr. 11: Descriptions and outlines for term papers due

      Apr. 16

      Apr. 18: Fourth paper due

      • Cass Sunstein and Richard Thaler, Nudge, Intro

      Should algorithms tell us what we want to hear?

      The analysis of data about you now substantially affects what you information you get. The fact that you tend to click on stories about celebrities may mean that you get lots of stories about celebrities, but few about politics. The fact that you are a die-hard Republican may mean that a Democratic candidate for office never makes a case to you for your vote. For similar reasons you may be able to communicate your message to certain people very well, but to other people not at all, because the data about them suggests that they will or won’t be interested in what you have to say. The transformation over the last decade has been quite striking (especially to those of us who grew up with only three television channels, where you had to wait—like some sort of animal!—a whole week for the next episode of whatever brainless show the network decided would appeal to the median viewer).

      Is this a bad thing? Isn’t this, in effect, letting you decide what information you will get? It’s not as though no one deciding—that is, barraging you with randomly selected information, without any attention to your interests—would be an improvement. Nor was having other people decide what information you will get—having a handful of affluent, white, male, editors in New York or D.C. decide what would appear in the newspaper—without its drawbacks. The question is how information should flow and for what purposes.

      It might seem that letting what you want to hear about, as revealed by your past behavior, determine what you do hear about in the future, is, whatever its flaws, at least ideal from the standpoints of democracy and free speech. After all, the algorithm is simply respecting your “vote” about what you want to hear, and no one is censoring anyone. But is this right? If democracy and freedom of speech are important insofar as they realize other values, then we can ask whether this sort of information flow does unambiguously realize those values.

      We will read two classic discussions of free speech by Mill and Meiklejohn, as well as Gillespie's account of how platforms and search engines currently regulate the flow of information. Then we will consider some concerns raised by Sunstein and Cohen and Fung about the effect of algorithms on the character of public discourse.

      Apr. 23

          Apr. 25: First draft of term papers due.

          Apr. 30

          May 2

          Conclusion

          May 13: Rewrites due

          May 15: Final exam assigned

          May 17: Final exam or term paper due

          Course Policies:

          Extensions:

          Plan ahead. You may request extensions from your GSI up until 72 hours before papers are due. After then, extensions will be granted only for medical and family emergencies.

          Submitting Work:

          Papers must be submitted online, through the “Assignments” tab. Do NOT put your name on the paper. DO put your SID and name of your GSI on the paper. If you do not want your paper to be shared with other students (e.g., as a model of something done well), then please indicate this on the paper. Papers after the deadline will lose one step (e.g., B+ to B) immediately and then an additional step every 24 hours. Any mishaps in electronic submission are your responsibility: forgotten attachments, unopenable files, bounced or lost emails, and so on.

          Turnitin:

          Turnitin will be enabled for all papers.  Your submission will be compared to a database of other papers and materials.  (It will be added to the database too, but only for the purposes of future comparisons within UC Berkeley.)  About 15 minutes after you submit, an originality report will be generated, which should be visible to you on the page where you submitted.  Don’t worry if the report shows some incidental matches!  That's almost inevitable, even if your work is entirely your own. 

          It’s overwhelmingly unlikely that what I will now go on to say applies to you.  Academic dishonesty is far less common than is often thought.  But, for the record: What if the originality report catches something that shows that your work was not entirely your own? What do you do?  Take a deep breath, rewrite the paper so that it is entirely your own work, and resubmit.  Your history of submissions and their originality reports will be visible to us.  But so long as your final submission is your own work (and not, say, the product of just enough tweaking to get past Turnitin), we will ignore the earlier submissions as though they never happened.  The aim is not to catch anyone, just to make sure that everyone fulfills the course requirements.

          “Re-grading”:

          You are strongly encouraged to discuss grades and comments on papers with your GSI or me. However, grades on particular papers and exams will not be changed under any circumstances. While there is no perfect system, selective “re-grading” at students’ request only makes things worse. “Second” grades are likely to be less accurate and less fair than “first” grades. This is because, among other things, the GSI does not have access to other papers for purposes of comparison, the student will inevitably supply additional input (clarifications, explanations, etc.) that the original paper did not, and there are certain biases of self-selection.  The only exception, to which none of these concerns apply, is a suspected arithmetical or recording error in your final course grade. Please do not hesitate to bring this to your GSI’s or my attention.

          Academic Dishonesty:

          Plagiarism or cheating will result in an “F” in the course as a whole and a report to Student Judicial Affairs. A (finally submitted) paper with a Turnitin report showing more than 10% material from other sources is likely to be viewed as plagiarism.

          Any test, paper or report submitted by you and that bears your name is presumed to be your own original work that has not previously been submitted for credit in another course unless you obtain prior written approval to do so from your instructor.

          In all of your assignments, including your homework or drafts of papers, you may use words or ideas written by other individuals in publications, web sites, or other sources, but only with proper attribution. ‘Proper attribution’ means that you have fully identified the original source and extent of your use of the words or ideas of others that you reproduce in your work for this course, usually in the form of a footnote or parenthesis.

          —Academic Dishonesty and Plagiarism Subcommittee, June 18, 2004.

          Accommodations for Students with Disabilities:

          If you have an official accommodation letter that is relevant to this course, please notify both me and your GSI at a reasonable time. We will do whatever we can to help.

          Our Policy on Sexual Violence and Harassment:

          Our goal is that this classroom is a participatory community where everyone can fulfill their potential for learning; there is no place for sexual harassment or violence. If your behavior harms another person in this class, you may be removed from the class temporarily or permanently, or from the University. If you or someone you know experiences sexual violence or harassment, there are options, rights, and resources, including assistance with academics, reporting, and medical care. Visit survivorsupport.berkeley.edu or call the 24/7 Care Line at 510-643-2005.

           

          Course Summary:

          Date Details