Transcript for:
Privacy in a Data-Driven World

so hi everyone uh yeah thanks for having me here at Stanford um as you mentioned I'm blae uh from the University of Chicago and I'll be talking about work uh a series of uh projects I've done with my students in the super group um so we're we're a big uh Collective of researchers focusing on security privacy human computer interaction uh ethical AI systems uh now large language models like everyone else you know it's whatever my students find interesting um I find interesting too right and you know there are a lot of different things I could have talked about today in lots of different topic areas but I decided to focus uh today's talk uh exclusively on privacy right and uh to set the stage for this I'll actually go through an exercise I use when I teach undergraduate classes on privacy to introduce kind of like what what do we mean uh by privacy and uh specifically uh what I'm going to talk about is is a book it's a Photography book which you might wonder like what does this have to do with privacy um and we'll get there it also has to do you know of course with one of my hobbies uh of which I have too many according to everyone from my PhD adviser to my sister um right and so photography is one of my favorite things I was you know when I was thinking about moving to Chicago accepting this uh job at University of Chicago I went to one of Chicago many great museums which is focused on photography um saw this book and it's called the transparent city um by this photographer Michael wolf right and this book um it's architectural Photography in a big coffee table book I started flipping through it and it's really beautiful it shows the growth of uh kind of the downtown Chicago area the loop uh River North these amazing be uh buildings beautiful symmetry lots of really uh straight lines some wavy lines right you kind of go through the book um you know lots of Lights um if you're a fan of like 20-year-old dad Rock you'd probably recognize the buildings on the right um as a famous album cover um right and so I was looking through this book and flipping through admiring these beautiful photos and I came across a photo um that really turns out caught the photographer's eye and this particular photo also caught my eye I'm from New Jersey and so I immediately recognized this as our official state gesture a person giving you the finger um right and so what what's the deal with this well so this person um was actually caught in one of the photos um once the photographer zoomed in U to quote uh the photographer one evening I was looking at a photograph I had shot and I saw in it a man giving me the middle finger it set off a chain reaction in me and I began to look through every file at 200% magnification to see what else was going on in those windows I saw hands on computer mice and family photographs on the desks of CEOs I saw people watching flat screen TVs he and so this book is actually half architectural photos and half zoomed in photos or photos um alternatively taken with telephoto lenses of people inside um their homes or their offices right and I think it's a really um interesting way to get my classes talking about privacy saying is this a privacy violation are these photographs of privacy violation right and um what I find typically is like some students will say yes some students will say no and most of the time students will say well it depends and they start asking questions um they start saying first of all well did people know they were being photographed beforehand or even afterwards did they consent to this and was this um this affirmative consent um you know are they being paid for this how are these photos being used as it being used commercially and they look through these different photos and sometimes they'll say well you know it depends on what they're doing like maybe a person um having dinner um is not a privacy violation but a person with some regrettable tattoos or a person watching TV with bad posture maybe those are privacy violations and they start asking even more questions you know they look at this and some of these people are in offices and they'll say you know well maybe it's not a privacy violation if they're in their office but it's a privacy violation if they're home in their home and they start differentiating there they'll also ask questions like well what floor um were these people on and you know maybe if they're in the first or second maybe even third floor is not a privacy violation because anyone walking down the street can see it but um you know if they're on the Upper Floor um well maybe maybe it is a privacy violation because people walking down the street couldn't see it right and then they'll get into discussions of like well the upper floors are usually a lot more expensive so is privacy a right only for the rich right they'll ask even more questions um they'll look at some of this uh these pictures and say you know this person's not identifiable well does being identifiable is that necessary for to consider it a privacy violation you they look at other people and say well it's pixelated um maybe the pixelation makes it not a privacy violation or maybe it doesn't matter maybe what act person is engaging in matters um you know how you know is a facial recognition system being used can we identify these people post ha are they in some databases and all these externalities are factoring into whether they say it's a privacy violation or not and then I I like to save some U of the images uh for towards the end and you know it's people where uh the photographer shot through blinds well the blinds aren't stopping the photographer from shooting these people um right but is this a signaling mechanism is does the fact that these people maybe had their blinds sort of down signal that they didn't want their photo taken what how do we think about signaling mechanisms right and so overall we you know this gets the students talking and privacy ends up being really complicated I mean this is something what is privacy is a question that's been asked by lots of of uh people over the years technologists legal Scholars philosophers Communications experts right a lot of this um in the US at least traces back to um one of the first articulations of a right to privacy in 1890 um it Samuel Warren and Louis brandise the latter of whom ended up becoming later in career Supreme Court Justice um where they basically um were looking at like well do we have a right to privacy and they were actually inspired by the rise of instantaneous photography which is think it's a really nice parallel that uh this Michael Wolf's Photography book got me thinking about this so they were they were um they noticed in kind of high society um Gatherings people were suddenly being photographed it was kind of the gossip uh magazines of of the 1890s right and so they they centered on like okay well you you seems like we should have the right to be let alone they go through a long legal analysis of other possible rights and they said the right to privacy seems like something different the right to say I don't want to participate in this leave me alone right and you know over the 20th century a lot of great Scholars refine this right um Alan Weston talked about maybe we have this notion of privacy is control you should be able to control information disclosures right but then social psychologists started thinking well sometimes you want to um give away information does do you basically give up your privacy rights if you give away information of your own valtion so Anin Alman Sandra patronio others started refining like what could this mean um and this refinement has continued into the 21st century you have Dan solv who's a legal scholar who you know basically wrote these beautiful um articles um one of my favorites is called um you know the uh oh I actually forget exactly what it's called but it's about the nothing to hide argument and how you know the nothing to hide argument you know if you have nothing to hide you have nothing to fear that actually has nothing to say and privacy is actually a lot of different things all rolled up together Helen nbal uh came up with this Theory called contextual integrity which basically says well there's appropriate flows of information inappropriate flows of information and we can start thinking about like what is and what's not appropriate um and that's all you know Guided by social norms and expectations and it's kind of a really rich way of thinking about privacy right but this is this is all getting maybe a little philosophical and I'm a technologist I I like building things um and so I often look at kind of concretely well what what's what are privacy principles how do we think about what guides us when we build system systems especially interactive systems uh those that engage with humans right so back in the mid 20th century um lots of governments and non-governmental organizations were kind of grappling with what what what are privacy principles we should have as we're moving into kind of a databased economy um and this was happening even back in the 1960s and a lot of different uh organizations came up with similar articulations but one of my favorites is from the US Federal Trade Commission and their Fair information practice principles are fips right they said things like well you should have these rights you should um you know conceptually um receive notice if data is being collected have some choice about this uh be able to access the data collected about you um we should be able to ensure the Integrity of the data both the security of the data so that it's like not stolen and that it's accurate and not altered and there should also be enforcement um of all this right so these were kind of conceptual principles not particular laws um right but you know regardless of that we were of governed for many years by what's a lot of people consider a notice and choice framework so even though there are five principles listed notice and choice are the ones you've probably encountered the most over your own lifetimes and so so notice and choice that's tell you know you're told something is you know potentially going to happen and you you arguably have some Choice inarguably maybe don't have any choice right and the way this manifests is often things like you go to a website and you see a cookie consent banner and you kind of play this ridiculous game where you try and find the like no button the no button sometimes involves lots of different clicks um right but uh you know so that's kind of the consent part of it um as well as notice but but notice itself um in my opinion it's kind of a failed right and just for an uh kind of exhibitive about this let's look at a Stanford's web page which I went to trying to to figure out how to get here yesterday right so I went to I you know go to stanford.edu scroll down to the bottom and click on privacy and it's you know there's an online privacy policy so I started reading and I kept reading and I kept reading and going and going all we're finally at section two and it kept going going going going going going going oh and I finally got to the end right and no I did not actually read this I had to finish making these slides right and so well this doesn't seem very useful you know I'm receiving notice but like like I go to a lot of websites uh this is not very a very effective way of thinking about privacy and it's also so abstract like I'm not a lawyer I have no legal training uh you I'm usually on the wrong side of the law and so I'm I'm certainly not the person to be thinking about like you know what does this all mean for me in this kind of abstract world right and so you know that said there are privacy laws and it kind of differs based on where you're living um what rights you have or what rights you don't have so for instance in the European Union um uh there's the general data privacy regulation or gdpr and so gdpr uh was passed in 2016 came into effect in 2018 um and you've probably heard about it you when gdpr came into effect you um probably got a lot of emails and not much else happened seemingly right um so what what what did what did gdpr um have us uh what rights did it give us um well basically said that you know people in the European Union have certain rights related to privacy the right to request data be erased the right to object to certain types of data processing that if there's a data breach the company is required to let you know very quickly um a right of access U to information collected about you which and we'll come back to this um that privacy should be built into these systems by Design and not bolted on uh at the end as my PhD adviser likes to point out well that just doesn't work you you can't retroactively retrofit privacy into a system right and so in the US uh we don't have um nationally anything like gdpr yet or maybe ever well remains to be seen But historically the US has had what are known as sectoral laws and so um essentially different aspects of privacy have their own laws right so children have their own Privacy Law you know governing how children's data is used um those of you who signed up for services when you were under 13 years old have violated this law um right but it it serves a really important purpose um in trying to protect children's privacy children are something an entity whose privacy is important and you know we're probably we care about financial data credit reporting um there's the FC we care about educational data furpa which you I'm sure encounter in your classes here at Stanford right we care about Health Data these seem all like really important aspects of privacy where our you know our data the data privacy of these different dimensions of Our Lives it really matters so any every want to guess um you know I have one more law I'm going about to put up what you know in addition to Children financial data educational data Health Data anyone want to guess what else is protected video rental records and you might ask why why why video rental records and it comes back to in the 1980s uh there was a a nominee for the Supreme Court um this nomination did not actually in the end go through um but uh various newspapers started saying let's try and dig up some dirt on this nominee so they went to this nominee's local video store and said well what did this person rent um and so this is very reactive but a lot of these privacy laws are really reactive in the US it hasn't been like you have like this deep long-standing right to privacy you know we don't actually in fact have that at all explicitly in the Constitution um what we have have had traditionally are these sectoral laws right but the European unions kind of in terms of Privacy Law been putting the US to shame and so different states in the have tried to catch up and most notably since I'm here in California um California has been uh arguably the main leader in this in terms of US states with the California consumer Privacy Act or CCPA which just passed in 2018 went into effect two years later um in CPR which amended uh CCPA and actually Illinois has yet to catch up but a number of other states have uh since then and there you know I'm not going to go into this but there there's you know has been there have been some pushes for National us Privacy Law like gdpr uh we're not there yet um although um as of a few weeks ago some of these ideas have been revived I thought this was completely off the table but it's it's it's back maybe right but CCPA CPR give Californians a lot of the same rights as gdpr some extra rights um you know let's not worry about those for right now um because what I really want to worry about is this this data driven world we live in right so you know a bunch of you are probably excited about uh machine learning large language models that's kind of the in thing in Computing uh right now um right and a lot of this is just driven by data a lot of data has to be collected about you to drive these models like large language models uh don't work if without lots of data and again data comes from people and so when I came to University of Chicago uh we had started uh you know expanding rapidly in the computer science department but also uh building up a data science effort um and this is something where like you know we had these questions like well what is data science um you know what is what does this all mean um and had some colleagues mentioned to me data is the new oil qu uh quoting uh this uh mathematician and entrepreneur um and um I realized after talking to them for a little bit they meant this as a good thing right they said data is the new oil there's so much potential this is going to drive so much Innovation and the whole time we had been talking up to that point I had been thinking that they were saying data is the new oil as if it's a bad thing you know data you know sometimes you're digging in the ground and like you know this liquid just starts spurting up and like you know you go to touch it and your hands get really dirty like the more you touch it the dirtier your hands get and it's really hard to wash it off data just like oil um right you know causes uh a couple of people often to make a lot of money and a lot of other people to be marginalized economically cause lots of geopolitical conflict and so I was like yeah data is the new oil and then I realized we were just talking about completely different things right but you know so how how does this datadriven world and like I like large language models in some cases I use them in some of my own research um I like you know I I'm fren amese with machine learning sometimes I build classifiers and sometimes I'm an angry person yelling at the clouds about the classifiers right but how does this all kind of intersect with privacy and we we often hear um if we you know read the news like privacy is dead um you know I could have filled this screen with lots of privacy as Dead uh newspaper articles just you too right but so so what does it mean that privacy is dead and in this kind of datadriven world so so what's privacy privacy is dead what how and so you know this is um I one time uh went to uh a talk uh with uh you know giving a talk like this at another University um this is intentional yeah yeah I want you all to listen to me don't stop looking at the screen don't be don't be just focused on on screens at all times our life does not need to be mediated by screens right all right so I was at this University like this um talking and that time I actually had something on the screen right and uh it was my title slide and so I was about to start giving a talk and I got a question from the audience and when you get a question from the audience on your title slide it's usually not you're in trouble um and so this person goes and it was a a talk about privacy and the question was privacy is dead I was like well that's not a question that's a statement and uh this person said well you know there's social media now um there's no such thing as privacy this new generation we we've moved Beyond privacy like what are you going to give a talk about this thing is dead um and so then I asked this the the question asker um well you know can you please uh you know tell me the password to your laptop hand me over your laptop um I'll log you know log into all of your financial sites um start telling me about kind of your intimate relationships with your family and and also please disrobe right now um and this this gentleman refused uh and I said well okay clearly privacy is not dead if you don't want to do these things um but so what what does privacy mean um right and I I learned later that was actually the dean of engineering at that University and so I probably should not have sassed this person um right but you know it's kind of got me thinking like what what what is privacy uh how does privacy interact with kind of data driven world and can we actually use data to better understand privacy and actually use data um in some ways as a good thing right and so I'm going to talk um for the the next about 20 minutes uh 25 minutes or so about um different efforts we've uh done of providing transparency about online tracking and I'll start since this is an HCI seminar um with um a particular um user study which I think gives some insight um into this this world um and this this is a world where we go uh online um and we you know a lot of the things we do online are tracked by all these different companies using things um like cookies um or fingerprints or you know potentially could have added sandboxes onto there since this is kind of the next Frontier in mechanisms for tracking what we do right but things are concluded about us and this drives a lot of online targeted ads and personalization and things like that all right so first youall talked about this user study we published at the kai conference a couple years ago um about taking data out of context to hyper person lives ads and so you know I've been talking to some of you know my colleagues and students about like well what does it mean to like you know kind of use data to personalize ads and like is it going to reach a point where people are like this is just too creepy I'm not going to participate in this ecosystem anymore like are we going to have like a citizen revolt against like just stop this is too much and so we you said like well let's maybe test a variant of this in a deception study uh deception study meaning people participating in the study didn't actually know what the study was about um but there's of course you know really you have to put a lot of care into how do you do this ethically they debriefing um you know requirements both uh legally as well as ethically and happy to talk to any of you who are interested in these kinds of studies but the the design of the study um well was between subjects so different people were assigned to different conditions and so half of the people um were shown um what we call a generic ad uh like treat yourself this week uh you super eats to reserve a restaurant near you for a deal um right and so my students really likeed that I you know was paying them as a research assistant to have flying french fries um and we said you know you know half the people actually in fact we're going to get this generic ad um as uh this kind of banner ad inside our survey platform um which led to some interesting uh feedback from people like oh you you poor University people you have to use the ad supported free version of qual Trix does not exist um you know but you know but so the other half we were concerned that people weren't going to believe this so we actually sent them robotext from one of their short numbers um while they were participating in the study um early on automatically we they got these texts right so half the people you know whether they saw a banner ad or a robot text um got this generic ad the other half of people got um what we called the hyper targeted ad Taylor treat Ryan to a date night this week in Memphis we know you love Tha restaurants use super eats to reserve a table at one of the seven year you for a deal and so how did we hyper Target this uh it was actually a multi-part study so a few weeks prior we had people um sign up to do an initial study again not knowing what it was about and amidst a bunch bunch of distractor questions uh we found out their name their relationship status um if they were in a relationship their partner's first name we did some ipg location to figure out where they were we asked them questions you know about a whole bunch of things they liked including kinds of Cuisines they liked right and so everyone um you know a couple weeks later when they came back for a follow-up study and it turns out almost everyone had forgotten they gave us this information they were really creeped out uh the people who got these targeted ads right so they you know we got um back to our short code uh texting number um well we were accused of being kind of uh some people thought we were a GrubHub gone wild um and so and you know lots we got lots of messages of like how dare you use this information to to Target an ad towards me thinking that we were actually the restaurant platform right um and so you know people were really creeped out which was one of our response variables um but we also had kind of subsequently in the study they had gotten this robot texture Banner ad um and we asked them a bunch of personal questions with the opport Unity to opt out um and we thought that people who were creeped out um right when they're starting to fill out this big battery of questions would be less likely to disclose personal information the types of personal information um we used to Target them in fact um and we did not find a difference and so we were like this is weird we reran the study um pretending we weren't University of Chicago um working with your institutional review board um to pretend not only that the study is about a different thing than it is but that you're a different entity than you are um involved a lot of discussion and negotiation um but we were able to do it in a way that we all felt comfortable with ethically um and we had a huge debriefing process for this um including for people who dropped out of the study right but so in other words what we saw from this was like people were creeped out um but they weren't changing their behavior um and so kind of what does this tell us about privacy um so from that we said well maybe if people better understood what was going on um they would act differently and so we've been over the last few years building series of tools um to try and provide transparency about privacy using datadriven methods um and so so in this uh study oh the Places you've been uh punting a Dr Seuss story um that my students uh you know Ben and others uh LED um what we did is we said we looked at the space of existing privacy transparency tools um and found out they they didn't really tell us um all that much that I found interesting yeah that's a clarification question the previous one so here if um I guess the the trust towards whether to believe this take a process if the user has been interacting this website let's see for a few times it it's probably okay that uh the website can tell their name and their favorite Cuisine versus if you just expose me to this for the first time it's like shocking like have you control the full this kind of I guess established process of familiarity your trust that was a yeah yeah yeah so we um you kind of trying to unpack these results ended up uh you know students and I talked for a long time we talked to colleagues and privacy across a bunch of different universities trying to understand these possible factors like you know trust in a platform trust in research studies like maybe people are disclosing and you know this was this was part of the explanation people trust um you know if they're giving data for research they might be more willing to do this than you know to a company um and yeah so there there there is a lot going on in trying to unpack this um but you know I think the for me the high level takeaway is you know data is taken out of context you often don't know where did the the data came from um and so it's hard to reason about kind of trust in a platform where like a lot of the people who uh in our study who received these messages like like how did you get my name how do you know that I'm dating Ryan that kind of a thing um and uh you know even when we told them during the debrief well he told us this a couple weeks ago they didn't believe us um which was you know it's interesting because your data just kind of goes everywhere and it's hard to make sense of it and so so you know that's kind of in these subsequent efforts we're trying to make help people make sense of it right and like the kind of the existing tools we have like you know they'll say like you know this blue Kai is tracking you right now or like we've blocked 11 trackers um of the 25 people who are trying to track you or whatever right and you know that's what do I do with this information it seems kind of kind of vague and Abstract um and so those are third party privacy tools even company's own efforts at transparency are also like really vague and Abstract like Google says you know when I ask Google what interest uh it thinks I have like cooking and recipes like like yes Everyone likes cooking and recipes like you know like who doesn't like dinner um right and so this is you know that's doesn't square with what um if you go sign up as an Advertiser uh you know to Target ads on Google like it's not cooking and recipes you have much much more specific topics um that you're you're able to Target ads on um and in fact it's you know even more um like if you're using various social platforms there's tens of thousands of different categories on which you can Target ads um so we built this browser extension that we call tracking transparency um and what it does is it locally on your own uh you know web browser in our extension records you being tracked um and like you know metadata about the page the contents of the page we built our own shadow model running all locally on your machine to say like what is this page about like what ad topics might this uh imply right and um it helps you you know visualize um kind of what's been tracked about you um so we worked with some information visualization colleagues um they were really into like these like Sunburst kind of visualizations um they were really not into the word clouds um but our participants were so we we just kind of went with the data um and you know we kind of showed like these are all the different things um Google might think uh you're interested in right and it we actually inadvertently ended up Al being kind of vague we did it based on um what were the most common categories and I'll revisit this right at the end of the talk um right but it had some impact on people's perceptions but less than I kind of thought it would um showing people data about how they had been tracked um and what could be learned about them you know didn't really create this Firestorm of like we need to protect privacy we need to stop this online tracking right and so so then my students and I you know got really interested in well what can we do that's better than this like can we actually leverage uh privacy laws to um do better and it turns out going back to not notice and choice but access this third principle from the fair information practice principles that actually helped us because um gdpr gives a right of access to data subjects um so does CCPA CPR um right of access means you can go to a company meeting certain requirements so generally larger companies that collect personal data um which is a lot of the larger companies um and you have a right to get a copy of your personal data right and this actually led us do um a couple years ago an analysis of how ads are being targeted on the Twitter ecosystem right and so I had kind of gone into this project thinking that the way ads were targeted on Twitter was based on things like you know my gender identity my location uh you know broad Strokes my interests right and this does happen but it turns out there's a lot more going on with ad targeting um there the reality is you know you're being targeted like you're you went to the grocery store and have regularly bought organic ketchup um on your loyalty card um and lots of other information about things you're doing that are are much more specific right and the way we were able to understand this uh was enabled by these data subject access rights U so we had a few hundred Twitter users uh volunteer for a study and volunteer to download their own data and share with us actually just one of the files that they got back from Twitter under these data access rights um there's this file my students had found called ad Impressions uh which was kind of a Json uh object uh very long of what turned out to be um all the ads that had been shown to you on Twitter in the last 90 days and one of the things that was really interesting to us as we were going through this file um was this uh matched targeting criteria key um and you know we were like does this mean the crit IA that were matched to you specifically because this is actually a lot more than if you click why did I get this ad on Twitter as my students were kind of spending you know months tracking their own data downloading their data every few weeks and tracking all the ads they were seeing on Twitter clicking why did I get this ad and uh so we found a European to invoke their privacy rights and they corresponded with Twitter's data protection office and confirmed that indeed um took us a number of months to get the answer that yes match targeting criteria does mean what we thought it meant um so he said oh this is so interesting let's do a study about this so again we had hundreds of people request their Twitter data upload their ad related data to us um and take a customized survey right so of the 231 participants in our study they had been on Twitter an average of over six years have been shown over a thousand ads on average in the last 90 days and we found actually 45 over 45,000 unique targeting types uh so location equals Chicago might be one targeting type location equals palto is a second targeting type um interests equals trees is like a third that sort of a thing um right and so we saw not just interest and demographics but things like um you know behavioral inferences or maybe even just stereotypes of your behavior maybe a less charitable interpretation as well as things like tailored audiences um and this is uh something I hadn't been familiar with advertisers uh could upload lists containing hashes of personal identifiers so hash of your Twitter username hash of your phone number which also happens to be the phone number you use for your grocery store card um and they could say who they wanted included or excluded from their ad campaign um and so Twitter's uh terms of service uh you know prohibit targeting on Race religion sex life Health politics financial status um and yet as we were getting data from these hundreds of people shared with us um we found keywords like unemployment gay # africanamerican being used to Target ads um conversation topics like uh the liberal Democrats uh political party in the UK and then tailored lists these are again hashes of personal identifiers and these are names that um the advertising company gave to the lists when they uploaded it for their own convenience presumably not realizing this would be shared with data subjects um and we we uh you know deidentified these you know account status balance due Christian audience to exclude LGBT suppression list um and you know my students saw this and we're like well okay this is this is not okay um that you know that our web experience is being targeted in these ways and this was a lot more concrete to my students I me specifically you know my students kind of um said like what is part of this you know how can we provide better ad explanations so we we reverse engineered how Twitter uh used this data to generate ad explanations again my students saw what was in their data and what they actually saw on the Twitter platform we did a similar thing of how Facebook would ad explanations and then we made our own we made these speculative ad explanations with a lot more detail either text or being slightly better designers um you know some more visuals and we ran um as part of um this kind of datadriven customized a study of Twitter users uh within subjects evaluation U comparing um again with people not realizing which was which randomized order all the proper things to do in study design um you know how people reacted to these different like you know twitter- like ad explanations Facebook like ad explanations and the kinds uh we came up with as well as a control of basically saying you got this ad because the The Advertiser bought an ad um right and what we found is our participants uh much preferred the things we had come up with with more detail about how they were being targeted right and you know so we said okay well clearly we can do better as a community on this if we wanted to um and we have a whole other project that's ongoing on how do you build better at explanations how do we actually compare what all different uh entities are doing right now but we also got much more interested in this right of data access right so my students you know especially my student Sophie saw um how people were being targeted and like if only people actually looked at their data and knew what was going on like they would be shocked and so he said well like maybe let's ramp up some projects saying you know can we actually make the right of data access something much more meaningful uh for privacy and so Sophie um and a number of our other students uh LED this study um on basically saying what could data access rights be um in particularly it was a code design study so we had um uh participants download their data um and so if you've never downloaded your data and probably most of you never have um what you get is actually you know often like a big Json file or series of Json files um or a bunch of CSV comma separated value files um you know if you download your data from Google you get a gigantic zip file like if you're like oh I wonder what happens if I try and download a 40 gigabyte file download your Google data um you know if you've ever said like well how big can a Json file Get You Know download your data from many other companies um and what we did is we had um people uh download their data and participate in remote code design focus groups so we had 42 participants across 12 focus groups each downloading their data from an assigned company um and they basically uh EXP we had them kind of come to our focus group explore their own data on their own machine not sharing it with each other or or us talk about what they were seeing and engag in a number of code design activities where they said like well if you wanted to get you know you know have an interface for your data show you the information um that's contained in this data what could it be so again these are kind of uh just end users engaging in speculative design um and we abstracted out from this what what could the design be of helping people explore their data I mean one of the things we noticed is actually a lot of our participants uh when they first got uh we found this out in piloting you know address this by the main study they didn't even know how to open these files they got like a Json file and like couldn't figure out like you know they had to pick a text editor to open it um and some of them like they tried to open it in their text editor and it crashed um right they you know would get zip files didn't understand how to unpack those files right so even just accessing the data was a barrier but once they were able to get to the data they often felt overwhelmed they felt some summarization tools uh were needed um so while these uh data downloads they were getting were compliant with their data access rights they weren't very useful to participants you know that said they didn't want just the summaries they wanted to be able to explore to search for specific memories um both because they wondered um you know does this company have this kind of data on me or also to revisit particular memories um you know Nostalgia is part of informatics is a really powerful tool um but then also this idea of being able to audit companies but but you know beyond just searching and exploring their own data they were interested in comparing their data in privacy preserving ways with other people and just understanding like well what does this all mean how is it used why does the company have this and these um by and large were things that are not supported by current uh data access rights right and so we've been continuing a follow-up work and work uh that my students Arthur and elen will be presenting at this year's yck security Symposium in August um and are actually presenting at yck pepper in Santa Clara um just a little south of here um next week in fact um a week and a half from now um we did a follow-up study that was much more quantitative um what we did is you know knowing that so many of our participants in our previous study had trouble exploring their data we built um and by we I mean my students Arthur and elen um and their colleagues uh built this react app that helped people um explore their data search through their data you know know understand how to parse these files but then a big part of our study was we had people annotate um subsections of their data to understand like well what did they find interesting what did they find creepy what did they find confusing or did they not understand um to help us understand um how can we build a better tool right and you know we also asked participants like what what what are questions before they looked at their data what are questions they would want to have answered um like they want to know things like what data is stored how the data is being stored and used uh what are some of their high level takeaways and implications from the data um and a lot of these things uh were not answered by just uh roughly exploring their data even you know that we provided these search and exploration features there's still a long way to go to make this much more useful right participants are also often confused by undefined terms there aren't like data dictionaries provided um by and large for these data downloads um you know things like you know these are often machine readable uh data downloads and there's they're actually rights I didn't talk about for data portability the idea that like I could download my Twitter data or X data and now move to a different um competing platform um these are parts of privacy laws and data protection laws as well often and that that needs to be in machine readable format but these data downloads often but not always are in machine readable format so our participants were like seeing these crazy numbers they like what what is this wild number it's a Unix timestamp but why would you know this if you don't uh if you haven't studied computer science or data science right and so so we're currently in the process of building what we call the data access Illuminator tool and ecosystem and our idea is we want to help consumers understand understand the scope and implications of the personal data collected about them you know through their own data um you know having these richer tools for people to perform kind of personal data science on their own data without needing to know data science visualizing it interesting ways giving the them the opportunity to compare themselves with others using different uh private computation and other like fancy crypto techniques um and so this is in general this ecosystem that we're we're helping to really push forward to reframe The Narrative of how people interact with their data right because like right now when you go ask a company through their privacy dashboard like what data do you have again about me they get to control the narrative they they frame what they tell you right but you download your data you get the raw data but it's hard to build understand like we're HCI researchers like if you have a 70 megabyte Json file there's no narrative here other than confusion and frustration right and so we've been really interested in like how do we like provide our own narrative for privacy like what if we can control this narrative and like how do we control this narrative how do we frame this in ways that might be compelling to people and so we've you know said you know well computer science is not the end all Beall of the world data science is not the end all be all of the world there are people who know a lot more about communication about experiences um and so we've started thinking more about kind of the role of Art in privacy and the role of provocations intentional provocations of privacy and so you know I go back to you know different artworks I've seen like I I have too many hobbies whenever I go to a new place I'm like let me go to the art museum and the especially the Contemporary Art Museum and they lot of different artists who are building Works about privacy these are some photos I took of myself walking around an uh an artwork about surveillance um a number of years ago right and it would show you know the camera zooming in on me um there was actually a record of all the people who had visited the uh exhibit recently um you know and this is you know just one representative of lots of lots of different uh artworks about surveillance and privacy and some of my students are working to uh catalog and taxonomize uh this this area right now but we've also said like you know as part of Education how do we think about understanding this world and bringing together people who normally don't have the deepest dialogues um and that's uh we've been bringing together through classes uh art students and computer science students and so um in addition to lots of great Fine Arts faculty at you Chicago uh one of the world's great art schools the school of the Art Institute of Chicago is just you know a couple miles from campus and so uh We've now two times this fall will be the third time uh co-taught a class you know myself and faculty at saic um bringing together a cohort that's half uh fine artists and half computer scientists to build artworks reflecting on uh privacy in the digital and physical worlds um so for instance we've had students um this is actually on the two sides my PhD students Emma and Kevin and in the middle Tom from siic um they've been building wearables um that are really focused on changing you know making you reflect on your agency that subcaption what you're saying but they subversively change what you're saying in small ways um to perhaps not reflect your values or or to censor you um we've had students do um kind of design uh iterations on well what if you know we had um you know a um mechanism for a plant where it would automatically water the plant but only it would only water the plant and give it light if the things that were being said around it were really positive and affirming right and so you know thinking about how the data is used like let's actually watch you have this plant die when you go to New Jersey on um this particular PL plant U being around it we've had students uh build uh interactive chat Bots where you it's the chat bot starts revealing information about itself um and then uses the information you disclose in response to build up a dossier and a print an FBI style dossier on everyone who visits the exhibition um right and so this is you know power of uh large language models in the power of the internet um we have had students um look at physical mechanisms like fog as a way to protect privacy but also to give a false sense of privacy so these students uh for one of their projects had filled a greenhouse with fog um and then tracked people inside using non-visual mechanisms and actually projected on the outside um what was being tracked uh the person who thought they were protected by the fog which also led to this interesting um phone call I got of from our building manager one day like Blaze what's with the greenhouse and I said the what they're like well your students are assembling a greenhouse in the middle of our first floor and they said their professor said it was okay and it turns out my colleague at the Art Institute said it was okay yeah build your art wherever you want and I was like no no please take down the greenhous uh right but uh yeah but I think it was really provocative um and I've gotten really interested through these excises through these collaborations both in uh research and teaching of what it means to have a privacy provocation as user experience again as people who care about privacy um can we actually frame the user experience um to have particular goals or to make people feel a particular way how do we essentially modulate people's reactions to their own own data so we have um uh you know my collaborating student Nathan uh you know student of my collaborator Michelle's at the University of Maryland is presenting work at the uh pets uh Symposium this summer uh where we made a new version of our tracking transparency browser extension that was intentionally creepy rather than just showing you um the the topics by frequency that you encountered on the web we would filter by which are potentially the most sensitive um things you've seen uh on the web um we would try and um infer things about you that could be inferred from the data but maybe aren't the first priority of uh a company like you know what do you search for late at night when um when are you sad um when you're listening to Spotify you know what's your most emo phase and when is this happening um these sorts of things and having these intentionally provocative uh experiences and this is uh continuing to be a really um for me compelling area of inquiry right so so overall what we're trying to do is have use data use these data driven user experiences you know build systems that uh you know inest data um visualize them in interesting ways provide these experiences to have people better understand well what's uh what's being collected about them what can be known about them and then how should you know according to us they think about privacy and maybe giving them alternative uh narratives right so so through these mechanisms you know to revisit the the title of my talk well can privacy exist in a data driven World um yes seemingly it can but we as uh kind of the designers the system Builders need to understand like well what is our contribution to this and are we collecting data um just for um you know to see what can be done with the data or are we helping people uh engage with their data and before I wrap up I'm going to open the questions in a minute you know if you just uh kindly sat through about 45 minutes of me talking about privacy and you're not interested uh in privacy at all our group does actually a bunch of other things too um you know one of the things um uh we've worked on recently and this is uh presented virtually uh last week at Kai and I missed the memo that I thought we were all boycotting Kai in Hawaii and it turns out I and my students were boycotting Kai in Hawaii and a few other people um but we we built this system called uh Jupiter lab and retrograde because I'm all about the bad puns as paper titles um and what it does is it actually tracks um how you're use how you basically read data into a Jupiter uh notebook and a Jupiter Labs instance how the data changes over the course of you cleaning your data modifying the data training classifiers and throughout the process it provides these dat driven contextual nudges about potential things you should be thinking about related to fairness and bias like are there demographic patterns uh in missing data um are there potential proxy uh variables in your data how do uh the different models you've built compare in terms of cross-group differences and model performance lots of other things and so um we've been doing a lot of work on that continue to work in this area um passwords um are have always been one of my favorite topic I've worked for many years and uh modeling passwords in my own PhD uh dissertation and now with my students doing lots of interesting work uh on that we collaborated uh over the course of about five years finally um were able to write a paper about it last year looking um we collaborated with you Chicago's it Security Office seeing essentially how vulnerable the University was to password guessing attacks uh leveraging reused passwords so that's like if you use the same password or similar password for you Chicago as you did for LinkedIn and Linkedin was hacked um how vulnerable were you so we collected um over 400 different data breaches um and found actually thousands of University of Chicago Affiliates who were vulnerable to these attacks um and some of them had been exploited some hadn't um and those were still their passwords uh so we through this big collaboration had them reset their passwords and got a lot of insight into how you know essentially how vulnerable people are how big of a problem it is and reused passwords uh were much more of a problem than uh common passwords meaning like you know if you say the biggest worri is I my password just can't be one of the 100 or 200 most common things people would guess um you know but I can use it if I come up with one good password I can use it everywhere do not do that um that's that actually puts you at a lot of risk and that's uh you know weon is one of the main threat vectors um and we actually found we were able to retrospectively find a bunch of successful attacks on the University that had leveraged reused passwords um from certain places and so I guess I'm a little too old for this but uh I learned all about Che uh this uh homework help site um I hear some Snickers in the audience yeah and so so turns out CH was a gold mine of you Chicago students passwords um LinkedIn was a gold mine of you Chicago staff and faculty passwords um and so we were able to trace this all back and so um you know moving forward we we have a bunch of work on password list authentication you know how do you use pass keys and other techniques to move away from passwords for web authentication happy to talk about these things we do a lot of work also on end user programming um be got it interested in this in these like if this than that style uh event driven rules and we've been doing a lot of work um Building Systems that rely on formal models of the world to help people program so we can actually uh through our formal model show you if you change your if this and that rules in certain ways how do the behaviors or properties of the resultant program differ um you know we could actually let you say here's some desired properties automatically translate it to formal Logics like linear temporal logic do model check on a system automatically synthesize or modify rules um to comply with the system we've uh more recently over the last few years gotten interested in the role of large language models as a form of end user programming and so we have a lot of ongoing work um in rather than writing rules you just state in Pros what you want to happen and kind of understanding the usability of That World um but we have lots of other interests security usability privacy ethical AI llms whatever whatever seems cool uh we work on and yeah thank you so much for listening and happy to take [Applause] questions any questions from the audience um I'm happy to just get it it started a very very nice talk and uh you mentioned the large language models in the end I think today the biggest challenge for many of the users is that we even don't know where the data the model is trying to come from what type of data they are collecting from us like the promt that we give to models is a part of the process and then we also hear news talking about like Samsung banned the use of large language model because of privacy leakage and there are also stories like chpt gives all like uh leak someone's phone number because other users use a very malicious prompt it feels like the Privacy issue is getting more and more severe in the context of chb and others do you think that any of the either like boundary regulation or any of the tracking and Analysis that your group has been working on can be applied to this new context and if so what are some perspectives to understand that right yeah yeah no great question um yeah so I mean as large language models really rely on a lot of data and the way we interface with them you know and this is true actually with a lot of systems that U have potential privacy imp appliations and rely on data our interactions uh the utility um and fun experiences of the system are usually really obvious but the Privacy implications are often very opaque right so like I go to chat GPT I type something in and I get a a funny story or you know one of my favorite things to do uh is um for people who like have been in Academia long enough that they have enough of a web presence to be scraped up in the training data um ask Chachi PT to tell you about them so you can um I learned actually recently that chat uh you know the story chat GPT wrote about my colleague Pedro at you Chicago claimed he was a marathon runner and Pedro has not run one block in his life uh and uh so yeah we all thought this this is really funny right um but our experience is like the Privacy um implications are all happening behind the scenes right and so this is I think my my call to people uh to designers uh and system Builders to say well how do we make privacy more transparent I think plays a really important role in this in in raising awareness um getting people to think about these issues so like you could imagine building a front end um to chat GPT that also kind of helps you understand what privacy leakage is happening and this goes back like we're not obviously the first people to think about automated transparency in these systems actually some of the your faculty here at Stanford have done pioneering work and making ubiquitous Computing systems more transparent um and you know I think there there's a lot of power uh in that but then there's also also you know this role of what do we as a society uh want and so actually I teach uh you know among the couple of different classes I teach at you Chicago one is a problem set like programming problem set based ethics class so unlike a lot of uh universities ethics classes where you talk about abstractly about ethical issues this class really focuses on give you problem sets you build a thing um and then we all as a class reflect on what everyone built and kind of how what the shortcomings are and how we feel about these tra offs right and um for the actually the the last lecture was last week and U one of my good friends in fact my undergraduate uh college roommate um is a lecturer here at Stanford um in political science and teaches um is one of the teachers of the undergraduate class on Justice and teaches classes on ethics and so so we had a really interesting debate uh last week about um large language models well among many other things large language models and like well they need the data so they can just have the data and like what do we think about this as a societal value and so you I want would encourage you all to think about like what Society do you want to live in and kind of how do we use a combination of uh government regulations kind of non-governmental organizations pushing uh for uh better privacy as well as systems you all can build um that help us uh embed our societal values in these systems any other question here yeah so I'm being recorded even though I didn't want to so no that's fine is it working I can't it's working you can't hear it okay sure so um I got a couple questions but first um I'm concerned about the time sync involved in being private and so transparency doesn't solve that it actually just adds a workload to me and although I mean it takes away some workload but then you've got all these actions you've got to take um and I thought the pro I'm going to connect a couple ideas I thought provocation was good because in a sense it it leaps forward to sort of the implications and not just revealing what data is there um and I guess what I'm interested in is like how do I how do I uh get to my action items what am I call to action like I need to uh stop having this recorded or get this data deleted or get a government initiative started or whatever and and so I think you're on in that path and I'm just curious like what you think about getting to that end point yeah you know I think that's that's that's a very astute observation that essentially you're adding your you're giving yourself a lot of work and potentially uh it's an instrument mountable barrier or unreasonable amount of work you need to do um to to stay private and so you know I look at kind of transparency as not something that's going to be um you know you have to engage with with every system but you know how do we provide enough provocative transparency to get you to care and then do we have Alternatives and so like kind of in the web browser space uh different kind of browser vendors like a brave browser for instance um in you know Milla Firefox they're really pushing like default settings that are more private for instance than Chrome um and you know they're so getting people to care enough to switch um but then there often these trade-offs right like you know if you pick if you use Brave browser out of the box some websites you visit are just going to like not work and it's really frustrating like it's really obvious to you when you're trying to add something to your shopping cart or watch a video and it's it's broken and so we're um kind of and I didn't talk about this today we're really interested and if we have these more private default settings how do you actually get the world to work about as well so you the cont TT of web browsing is how do you automatically um if you're blocking a lot of tracking Unbreak websites are there ways to do this automatically and this is something a bunch of my students are working on but there's also kind of ways to design systems where for instance if you're using a large language model and you're uh similar to actually something Omar and I were talking about this morning you can personalize it to yourself locally um like data doesn't need to flow to the cloud um for certain types of personalization you know but unless you realize that you should care about these things uh a vendor saying our mod is not as good but it's more private like how do you actually get people to to kind of accept you know it's not going to work quite as well and but that the overall trade-off is is what they want accept yeah so I think uh we're actually happy to talk more offline I think we're maybe out of time for the class but yeah thank you all I'll be around for the day um you know I have some actually open slots in the afternoon if any of you want to talk uh happy to talk to you and yeah anyone who's a student or anyone who just came to the to the talk so thank you thank you so much thank you