Uh it's my great pleasure to introduce Judy Fan who's our colloquium speaker. Um Judy is one of my favorite cognitive scientists really of any age. I I think she's definitely one of the leading by any objective standard leading uh of the younger generation people who are have you know uh have just junior faculty or I don't even know how junior you are but she's she's been assistant professor at Stanford for a couple of years. I think that's a good estimate, right? And um she's by any means one of the like rising or glowing superstars of our field. She's won a number of awards. The Glushkow dissertation prize, the uh NSF career award. I think you won that, right? Real time talk introduction. Um but none of that really is that's not the reason that we invited her. Um I think of Judy as one of the most creative researchers I know of any stage in any field and I really mean that deeply. Um she has a background coming out of neuroscience and in that in many ways is at is very much at home talking to all the sorts of people who inhabit this building who record signals from brains and try to make sense of them computationally. She's done a lot of that. She's worked with colleagues of ours whose work we know well like Dan Yemens was one of her mentors and she uh has studied vision. She's a lot of her background is in visual psychophysics and visual neuroscience and computational neuroscience. Um but if you look at her research some of which you'll see today and a lot of which you won't even see today. She's gone in in in in many other directions or I mean or let's just say one key direction which is from basic the most some of the most basic processes of perception and the things that you can describe with nice elegant computational models to aspects of cognition that really are what make us human both biologically and also culturally. And you're definitely going to see some of that here. And so that means she studied things like artistic expression or creative expression in visual and other media. Uh she's very interested in narrative expression. She's really interested in education, learning and teaching and how we make sense of the world symbolically and through data and and explanation. And she's so she's really moved towards these much more complex cognitive processes that are really distinctively human both biologically and culturally and increasingly really of great import in in our current era. But she hasn't given up any of the rigor or I would say her taste for rigor. And I I know from a lot of interactions with Judy that what keeps her up at night is this is a struggle and the challenge of how do you get at these really important hard things with the kind of rigor and precision that many of us grew up valuing and being inspired by in for example visual neuroscience and computational neuroscience. And I can't say she's completely solved that problem. It might take a while, but it's a really great challenge. It's one that inspires me the way she works on it, and I hope that it's one that we can all learn from and be inspired by and to see where where she is and where she's going with this. So, Judy, please tell us. Okay. Wow. Um, okay. I wasn't I wasn't totally prepared for that. That was incredibly kind. Josh, I I really um Thank you and thank you all for being here. It's I I I can't actually express how wonderful today has been so far. Um I have a lot of affection and respect for this department and in community and the kind of values scientific non otherwise is embodied in the work that you all do here. And so it really is a treat to um spend a few minutes telling you about some of the work that we've been doing. Um so we study cognitive tools. What are those? Okay, let's start with something as familiar and simple as the number line. I've been told not to move around as much because we're using this mic. Okay, I'm going to try to do that. Um, nature didn't give us number line. We invented it. Okay, so as the Spanish architect Antony Gaudi once put it, there are no straight lines or sharp corners in nature. But that didn't stop us. Okay, we created them anyway. Um and we extended the number line of course a few hundred years ago to create rectangular coordinates which turned out to be super useful. They were genuinely cutting edge tools for thought um uh for driving new mathematical discoveries um when Renee Deart and his contemporaries realized that you could link up algebraic expressions with geometric curves in order to solve all kinds of mathematical puzzles including this one that has stumped the world for millennia. Um, how many are familiar with this particular the Delian problem? This like riddle of the oracle of Delos. It's the problem of how do you double the volume of a perfect cube? It's really really difficult to do using the mathematical methods of that community in time. Um, and here's the solution for that. Um, and it'd be really hard to overstate the impact of this invention over the last four centuries. Um that technology converted all kinds of problems like these where you might want to find the uh set of values that satisfies um two equations into essentially problems that relied on locating points of intersection between two curves. Now imagine what happened next to that revolutionary tool. It became something that we could take for granted. Um, it became so useful that it's become basically indispensable to the way every generation is educated. Virtually every mathematics curriculum on the planet introduces a combination of symbolic and graphical notation for representing and manipulating mathematical objects. And um the question that we wrestle with is how did we get here and what is it about the human mind that makes that kind of continual innovation possible? There are lots of ways of um approaching that question from many different disciplinary perspectives um including history, anthropology, economics um and I think that cognitive science also has a really important contribution to make here um and I think the story begins at least 30 to 80,000 years ago um when anatomically modern humans um began to mark up their physical uh environments um essentially um you know including these cave walls here um iconically repurposing objects and surfaces um in their surroundings into carriers of meaning. We obviously didn't stop at cave walls. The story of human learning and discovery is deeply intertwined with the story of technologies u for making the invisible visible. Um so here are just a few examples from the history of science which I love. Um, we've got Darwin's finches, um, and, uh, these illustrations produced by John Gould, the ornithologist that Darwin worked very closely with. Um, only when you see these cases side by side does the kind of morphological variation begin to really become salient and pop out. Um we have um the telescope that Galileo used to observe the movement of the moons around Jupiter. The kind of resolution that he needed in order to question um the uh orthodoxy when it came to how the solar system was organized. Um we have Ramonica Hall's um much celebrated drawings of the retina um as seen under the microscope um showing us what different parts of that um um part of the nervous system um were like and how they were hooked up um to one another. And in the 20th century, we have Fineman diagrams um named after, of course, the um physicist Richard Fineman um showing us uh subatomic particles winking in and out of existence. Um events that we literally could not and will never be able to observe directly with the naked eye. And more than any other species, we leverage this understanding, this expanding understanding of the of the world. Um um in order to well, what maybe to to to go back just one um beat is that like I want to draw your attention to the variation on this slide. So notice that some of these images are quite um detailed and faithful, if you will, to the way the visual world looks to us when we open our eyes. Um let's take Darwin's Finches at the top left for example. Others are much more schematic. Um but what all of these examples share in common is they leverage um what I've been calling visual abstraction to communicate um what we see and know about the world in a format that um highlights what is relevant to notice. Okay. Um and then building on that we uh leverage that expanding understanding of the natural world um through the use of those tools for for learning in order to create new things. So for example, our detailed understanding of physical mechanics allowed us design and then build high precision timekeeping devices. And a lot of this technological progress has been driven by our ability to continually reformulate our understanding of the world in terms of those useful abstractions that make it possible to re-engineer the physical world according to our design. So translating biological insights into bioengineering, physical theory into advanced physical instrumentation, neuroscience into medical devices, um and quantum mechanics into modern electronics. Okay. So this is you know a a sampling the kind of phenomena that I take a lot of inspiration from and you know the question we continue to wrestle with is like what what about us makes all that possible. This is a schematic um that I've been using over the last few years to help me think about the key behavioral phenomena um in play and that'll also serve as um a framework for embedding the different lines of work that I'll be sharing with you. So this is one way of illustrating the traditional mode in cognitive psychology um my home discipline which focuses on how people process information supplied by the external world. Here's how that picture is enriched by the study of social cognition which considers the behavior of multiple individuals at once and how they interact with each other. When those activities are used in the service of learning about the world and sharing that knowledge with others, they've been argued to share important similarities with formal science, even when pursued by non-experts in everyday contexts. So, building on that tradition and with the goal of understanding how humans made all those remarkable discoveries and inventions that I shared with you just a minute ago, I'd like to argue that there's still two critical ingredients missing from this picture. First is an account of cognitive tools or technologies. material objects that encode information intended to have an impact on our minds, how and what we think. Second, I like to argue that the time has come to embrace science's natural compliment engineering. Um, how people leverage their understanding of the world, whether it's through direct experience or is um socially mediated to create new and useful things. Um because without a serious consideration the engineering half of this picture I would venture to say that we'll never be able to explain how and why the world as we know it came to be. So at its core um to state it really bluntly um research in my group aims to close this loop. Okay. to develop psychological theories that explain how we go about discovering useful abstractions that explain how the world works um jointly with theories that explain how we then apply those abstractions to go make new things. And my plan today is to tell you about some of the work that we've done so far along two lines. Um in the first part, I'm going to share with you what I think we've figured out about how people leverage visual abstraction to communicate semantic knowledge. um using freehand drawing as a central case study there. Ordinarily in the second part, I would then have told you about um about um our our work investigating how people learn and coordinate on procedural abstractions um when building physical things um aligned with the engineering segment. Um but today I wanted to try something new with you all. So I wanted to instead um share with you some of our emerging work. So I'm really curious to hear what you think. Um exploring the cognitive foundations of data visualization. Okay. In which people harness um multiple information modalities, graphical elements, words, and numbers to engage in statistical reasoning. In other words, to learn from finite amounts of evidence um about aspects of the world that might be difficult or impossible to learn through direct observation by a single individual. Okay. And I'm still happy to chat with folks about our work on physical assembly and physical reasoning probably at the reception. Okay. So to dive into part one, how do we begin to think about how people use visual abstraction to communicate what they know and what they see? Here I think it's useful to think about um three behaviors that build successively on one another. So first of course visual perception the problem of how we transform raw sensory inputs into semantically meaningful perceptual experiences that in turn makes it possible to even contemplate visual production. Okay, that ability to generate a set of markings that leave a meaningful and visible trace in the physical environment and those come together during visual communication. how we decide just um how to arrange those graphical elements and in what order in order to have a particular kind of impact on other minds whether it is to inform or teach, persuade, collaborate or any other purpose that we can put um those marks to. Okay. So um this is a this is going to be an overview of three um studies that I'll be walking through. Um um first uh we'll begin with the question of what is the perceptual basis for understanding what a certain class a subclass of pictures um represents. So to get off the ground, we started by considering maybe the most concrete and familiar instantiation of visual abstraction um creating um a drawing um by hand that looks like something in the world. What makes it so easy to tell that the drawing on the left of this slide is meant to correspond to the realistic bird rendering on the right? There have been a lot of different ways of responding to that question. Um I you know uh two have been really dominant. Um so the first is that the first view is that we essentially see drawings as meaningful and representing things because drawings um simply resemble objects in the world. Okay? Like this drawing literally looks like the bird and that's how we know. The second response is that drawings denote objects primarily as a matter of convention and we only learn which drawings go with which objects and meanings from other people. So if you take this Chinese character for example. So in an earlier line of work um uh my collaborators uh close collaborators Zan Yemen's and my PhD adviser Nick Tuk Brown and I discovered that general purpose vision algorithms so in this case um neural networks that compose multiple layers of learnable spatial convolutions that were trained on natural photographs were capable of generalizing fairly strongly to even quite sparse sketches that didn't look photorealistic per se. suggesting that the problem of pictorial meaning and resemblance might be resolved simply by building better models of the ventral stream of visual processing especially ones that accurately capture those operations performed in those brain regions. We were not the only ones to discover this. Um, there are easily thousands of computer vision papers indexed by Google Scholar that use some kind of uh, confinet or other neural network-based backbone to encode sketches and natural images for a variety of applications. Um, all of those results you can kind of think of as vindicating um, an updated modern resemblancebased account. It was also a really useful insight for us um, locally with with other practical consequences. So in some more recent work led by Charles Louu, a former master student in the lab and in collaboration with Shaolong Wong at UCSD, we built on top of those early findings to further stress test that resemblance account. basically taking um a confinent backbone um and then training a decoder on top that could map local elements in a sketch to particular to corresponding elements in a photograph under the constraint that you could warp or rumple the sketch but not tear any holes in it. Okay. And the success of that approach I'm just demoing here. It's not really a result. It's more of a demo. The su the success of that approach suggests that fairly strong spatial constraints govern how the parts of sketches correspond to the parts of real objects that they're meant to represent. Okay, so that's great. Okay, we've got good enough trainable models of sketch understanding to build downstream applications that work pretty well. Okay, I guess we can pack up and go home. Um, but of course that's not the whole story. Okay, a static deterministic account of visual processing falls short explaining how we generate and make sense of drawings like these, which you might find all around this building. What these blobs and boxes, squiggles, and arrows mean depends on what we're talking about. So our next goal was to figure out how to incorporate that information about context to begin to account for a greater variety of graphical representations that we in fact use to communicate ranging from more faithful pictures like those on the left here to the more obvious symbolic tokens on the right. Our first paper tackling that issue asked how people knew when they needed to produce a more faithful drawing and when they could get away with something more schematic or abstract. So in that study, we paired two people up to play a drawing game. The sketcher saw a display that looked something like this where their goal was to draw the highlighted target object. Okay, the third one here. Um, and we varied what the other objects in the display were. On close trials, the distractors all belong to the same basic level category. Whereas on far trials, the distractors are from different categories. Okay. Using this really simple manipulation, we discovered how readily ordinary folks, okay, can adjust the way they use depiction to communicate. um making more detailed and faithful drawings on those close trials when they needed their drawing to uniquely identify a particular exemplar, but then sparer drawings on the far trials when they could get away with these like category level abstractions. So here are some examples of actual drawings that we collected in that study. And we found that um sketchers used fewer strokes on far trials, um less ink, um less time to produce those drawings while still achieving sealing accuracy on the fundamental task of communicating the identity of the target visual concept to the viewer who themselves took less time on those trials to make their decisions. Then to capture that behavioral pattern, we proposed a computational model of the sketcher that consisted of two parts. Okay, so a convenet to encode visual inputs into a generically abstract feature space and then a second probabistic decision-making module that inferred what kind of drawing to make depending on the context. I'm just going to give you the takehome from that study because it's it was I'm I'm I'm really excited to get to other work in this talk. So the take-home from our model ablation experiments was that both the capacity for visual abstraction which you operationalize as the network layer in the visual encoder module and sensitivity to context were critical for capturing how people manage to communicate about these objects at the appropriate level of abstraction. Okay. Okay. And then in more recent work, we pushed that idea even further to understand not only the impact of the current referential context on how people communicate, but also the conditions under which new graphical conventions might emerge. Um, when memory for previous interactions with the same person leads people to produce even more abstract, maybe even proto symbolic tokens over time whose meaning depends even more strongly on that shared history. Okay, so all of that was really exciting progress um to to me. But of course, people possess much richer knowledge about the world than just what things are called or what they look like. Um and a particularly important way that we use visual abstraction, especially in science, is to transmit mechanistic knowledge about how things work. So what is going on in people's minds when they make that move? um going beyond what is visually salient let's say about a specific bird to highlight underlying physical mechanisms uh for example how birds achieve flight in general so you can imagine my excitement when Holly Huey a wonderful former PhD student in the lab who's now at Adobe research um was also fascinated by this question and while we knew from very cool work by Frank Kyle and colleagues that people privilege mechanistic explanations when learning from others s and from work by Robert Chverski, Mickey Chi, Tanya Lamroso, and others that people can learn by producing explanations, we realize there's a lot that we didn't know about how people thought about visual explanations. Like what do people think is supposed to go into a diagram that illustrates how something works and what makes those different from an ordinary uh illustration that is just intended to look like something? One possibility um that I'll sketch which I'll call the cumulative hypothesis is that people basically think of visual explanations as being like extended augmented versions of ordinary depictions. Um so in this table um explanations would have all the things that depictions have to communicate visual appearance and then tack on information about physical mechanism. Okay. Um, an alternative possibility, which I'll call the dissociable hypothesis, um, is that people think of explanations as being images that pick out mechanistic abstractions while greatly deemphasizing visual appearance. Okay, there's a kind of selectivity there. So, tease apart those possibilities, Paulie designed a study to probe this question in two ways. Um, first by characterized the content of visual explanations in detail and comparing them to visual depictions. and second to measure how well either of those kinds of images actually help downstream viewers perform the task. We can extract information that they really needed um whether it's about appearance or about mechanism. So rather than a lesson on bird flight, which is fascinating but complicated, um Holly constructed six novel contraptions um with a clearly observable mechanism for closing a circuit and turning on a light. Okay. So, here's an example of one of those machines and the instructional video that participants watched. Okay. All right. Participants in the study actually saw that twice, those two demonstrations twice, but there you go. Um, so notice that this machine consists of three different kinds of parts. Okay. There causal parts that need to rotate to turn the light on. They're non-causal parts that look very similar but don't actually cause the light to turn on. And then background structural elements that are colorful and hoist up the other parts are really important but don't directly participate in the light activation circuit. Every participant produced explanations of some machines and depictions of other ones. On explanation trials, they were asked to imagine that their drawing would be used by someone else to understand how the object worked. And then on the depiction trials, they were asked to imagine that their drawing would be used by someone else to identify which object it was out of a lineup of similar looking objects. Okay, that's manipulation. Really simple. Um, using that procedure, we collected a large number of depictions and explanations of each of these uh six machines. Eyeballing them, the drawings seem to look different. Okay, for example, maybe there's a little bit more background in the depictions. Maybe there's some more arrows in the explanations. But Holly wanted to be really systematic about this. So she crowdsourced tags assigning every single stroke um in every drawing to one of four categories. Okay, so one for each of the three kinds of physical parts, background, causal, and non-causal. And then a fourth category that was a catch-all for um symbols. Okay, so um in this particular context, we're talking about arrows and motion lines. um and then use those tags to compare what kind what balance of semantic information people emphasize in these two conditions. What she found was that while people drew causal and non-causal parts in both conditions, um they allocated more strokes to the causal part and explanations than the non-causal ones. Okay, they also emphasize the background a bit more in depictions than explanations. and as we suspected spent more of their ink on symbolic displays of motion and parts interacting in explanations than depictions. Okay, so already these results are incompatible with a strong version of the cumulative hypothesis which would have predicted the relative emphasis on background causal and non-causal parts would be maintained even in the presence of arrows. But however reliable those differences might just amount to stylistic variation that doesn't have any impact on how useful they are um for the tasks that they were intended to support. So to measure those functional consequences of those uh decisions um Holly designed three different inference tasks. The first asked how easily you could tell what kind of action would be needed to operate the machine. So here pull, rotate or push. what you might expect a good mechanistic explanation to make clear. The second was to measure how well each drawing could be used for object identification, exactly what depictions are supposed to do. And then the third was a more challenging visual discrimination task where you had to determine which of two highlighted parts was the causal one. Okay? requiring you to establish a detailed mapping between the parts of the drawing to parts of the actual machine. Something that you'd really only expect from a mechanistic explanation that also preserve enough information about the overall appearance and organization of the parts of the machine. So under the cumulative hypothesis, what we should see is that explanations are at least as good as depictions for all tasks. um but under the dissociable hypothesis that they might be better for the action task but worse for the object task. And what Holly found was more consistent with the dissociable account where explanations better communicated how the mechanism worked but depictions were were better for communicating object identity. Interestingly, explanations were not better in this sample for conveying the identity of the causal part in that challenging third task. consistent with the idea that by leaving out a lot of the background details in some of those explanations that they may have abstracted away the very information that would make it easier to link up specific parts of the drawing with parts of the machine. And the bottom line from those studies is that people share intuitions about what is supposed to go into a visual explanation even if this is the first time they've been asked to generate one. And it can mean sacrificing visual fidelity to emphasize more abstract mechanistic information. And more generally, this work shows how important communicative context and goals are for understanding why people draw the way they do, why depictions look the way they do, and provided some experimental and analytic tools for characterizing the strategies people use to communicate visual information that is goal and context relevance. Okay. And then in this next section, we'll be asking what would it take to develop artificial systems that are capable of humanlike visual abstraction. And here's why. We fundamentally want useful scientific models of visual communication. And the work I've presented so far is representative of where we've prioritized making investments. um namely the development of experimental paradigms and data sets to measure and characterize those behaviors in fuller ways across a broader range of settings. Meanwhile, we've also been investing considerable energy in evaluating how well any member of the steadily advancing cohort of machine learning systems might continue um to be relevant and promising candidates for capturing more detailed patterns of human behavior in these highdimensional tasks. specifically as models of human image understanding as well as models of image creation. Okay. So the task setting that we consider takes inspiration from this famous series of drawings by Pablo Picasso. Um some of these are very detailed. The last few are very very abstract. Um yet all unmistakably bulls. Any scientific model of human visual abstraction worth its salt ought to be able to rep represent the ways in which these bulls are all different from one another and somehow at the same time all bullish to their core right or rather as bullish as they actually look to real human observers. So in this work that was a huge team effort um led by Kushian Mukerji um who is defending this Friday before joining the lab as a posttock um with contributions again from Holly as well as Charles Louu Yael Vinker who's here actually um and Rio Agina Kang um we um uh we began from the premise that like a strong test of whether we're on the right track um towards those scientific models um is that we'll be able to um build algorithms that can generate and understand abstract images the way that people do. Sketch understanding, okay, is one of these deceptively simple yet uh poses a fundamental challenge for general purpose vision algorithms because for one it requires um robustness to variation in sparsity. Okay, like some sketches are more detailed than others you could say. Um and two because sketches demand tolerance for semantic ambiguity because sketches can reliably evoke multiple meanings. So we created a benchmark which we call SA um to pose those challenges explicitly. Okay. So we collected 90,000 um handdrawn sketches made by about 5 and a half thousand people of 128 visual concepts under varying production budgets. So, here are some examples of what it looks like when people had to create sketches cued by a photo. These are photos taken from the things um data set, if you're familiar with that, in less and less time. Okay, so by the time we get to the 4 seconds are real, real sketchy. Um we then took those um drawings and then showed them to both people and 17 different then state-of-the-art vision algorithms representing a broad array of different kind of architectural commitments and strategies and training protocols. Um, yes. Quick question of clarification. Do people get to think about it before you start the timer or or do they see it and then they have 4 seconds? That's something that's been haunting me for two years. Not enough. I think they No, no, no. And I think so what we're studying time includes both thinking and drawing. Thinking and drawing. So they see the image and then they have to go. Okay. So this is so we don't know exactly how we we there is this restriction on the overall trial duration. I think that to really get at the bull phenomenon, we would I would want to give them unlimited planning time and then just limited execution time and that would have been the way to do it. This is not that. Okay. So there's like another data set that is has yet to be born that isolates that more directly. Okay. Yes. So the 4se secondond drawings are like quite derpy. Okay. That's like the technical term of art for that. But they are like what people did in this in this in those settings. Okay. So we had all of these different sketches. They look the way they look under those conditions. Then both people and those vision algorithms perform the same sketch categorization task allowing us to measure the full distribution of labels evoked by every individual sketch. Okay. The first thing we established was that as you give people more time to think and make a detailed think about how to make and then go out and make a detailed sketch, those sketches uh get more recognizable to models and people. Um they're less ambiguous as measured by the entropy of the label distribution. Um and even when the guess is wrong, okay, not the top one, okay, it's more likely to at least be in the right semantic neighborhood estimated by um language embeddings. So that's reassuring. But then we dug in a little bit deeper and then found that while some models do um honest to goodness perform better recognition task than other models, the variation across models in terms of performance, it's totally dwarfed by the gap between models and people reliable signal in human recognition in terms of both performance which I'm showing on the left and also relative uncertainty about the meaning of a sketch. Okay. So that suggests there's still a sizable um human model gap in alignment to close here in terms of for sketch understanding. Nevertheless, okay, at the time we conducted this benchmark, we noticed that the clip trained models were outperforming the others um making it a reasonable candidate to begin to explore generative models of sketch production built on top. So we explored the capabilities of a particularly cool sketch generation algorithm named Kapaso um whose development was led by Yael Vinker. We asked Clapaso to generate some sketches as well and also manipulated its production budget and in this case the the the unit now the currency is the number of strokes which is different which I know okay um and what we found is that even while human drawings and Clipos's drawings were similarly recognizable okay across the four production budgets by the way this overlap in recognition and recognizability is like totally coincidental given how the units are totally Um but we found that human the really intriguing um discovery here is that humans in Klepasso sparsify their drawings differently. So what I'm plotting on the y- axis on the right hand side here um is the divergence between the label distributions assigned by people to sketches produced by other people and sketches produced byo under those different budgets. So in other words, if you take like a ser if you take seriously like a functional view of sketches as being four communicating concepts and characterize their meaning in terms of the full distribution of the labels and meanings that it evokes, then even though the 32stroke kaso drawings and the 32 human sketches look different when you look at them, okay, stylistically they are different, they are also quite functionally similar in terms of like the the the set of meanings that they convey in the in the distribution of meanings they evoke. On the other hand, as you tighten the production budget, that's where you really start to see much larger divergences between people and kapaso. And we're hoping that this SA benchmark will be a useful resource for others who are interested in um developing um uh models of humanlike visual abstraction. Okay. Now, um, in the second and I think shorter part, um, I'll give you a sneak preview of, um, our newest and mostly unpublished work on multi-modal abstractions, um, and how they are used to support statistical reasoning. So, back to that Cartisian coordinate plane. We're a room full of scientists. You don't need me to tell you when you're making observations about the world. It's never this clean. Instead of perfect lines, we might actually collect something like this, right? Collection of data points. They land where they land and from which we try to infer some underlying structure with the actual generative like actual generative process gave rise to them. That inferial move is a fundamental building block of scientific reasoning. And we sure don't do it just by memorizing everything we've ever seen and thinking real hard. We use technologies. Okay. Right. At the beginning of my talk, I showed you these examples of the use of drawings produced by hand as a particularly enduring and versatile accessible tool for making the invisible visible. I think that's remarkable and worth understanding. But perhaps one of the most impactful technologies to have been developed in the modern era was the invention of data visualization. Like the telescope and microscope, plots help to resolve parts of the world that you can't see directly. But unlike either of those optical technologies, um it allows you to see patterns and phenomena that might be too large, too noisy, too slow to see with our own eyes. They're ubiquitous in the news, the cornerstone of evidence-based decision-making in business and government, and they're indispensable in every field of science and engineering. What I'm showing you here is one of the first time series plots drawn ever by William Playfair in 1786 to show the balance of imports and exports from England over an 80-year period from 1700 to 1780. For a while, imports exceeded exports, but then the relationship flipped in the 1750s as exports really took off. Here's the thing. Unlike a drawing of one of Darwin's finches, okay, if you haven't seen one of these kinds of images before, it may not be obvious what you're looking at, but once you learn how, it's a kind of superpower, right? Like so many individual observations can be distilled into a single graphic that tells a story that you can read just by looking. And that's not even all the reasons to care about plots. Because they're such a powerful tool for helping people update and calibrate their beliefs about a complicated world. Developing the skills to read and interpret and even make graphs has long been a goal of STEM education in this country. something that's becoming even more important over time. The New York Times broke a story. This is actually from about a year ago now, um about recovery from COVID related learning loss in mathematics based on work led by some of our education colleagues um at Stanford and Harvard. And it looks like across a bunch of different states, the orange arrows are there, which means that there's been some recovery, but there's still a long way to go. And I think that successful theories that explain how people use these kinds of images to discover and communicate important quantitative insights will help us equip people with the kind of quantitative data literacy skills they need more generally. So I'm going to highlight briefly three directions that we're pursuing in this vein. Our first question asks about the underlying operations that are needed to understand plots. So the strategy that we've been taking is to obtain machine learning systems that can handle questions about data visualizations at all, assess alignment with people, and then interrogate the source of any of those gaps, any gaps there might be. Okay, what we need of course to get off the ground is some way of measuring understanding. So here's what that might look like. Say we're looking together at this stacked bar plot and someone asks you about the cost of peanuts in Las Vegas. Okay, we take take a moment to take it all in. Scan around. Suppose that person then gave you four options to choose from. Okay, I I added the little charcoal thing. Now, if you thought it was a, you would be right. But that's not what some prominent vision language models say given this very same question. So in a herculean benchmarking effort led by Arnav Verma um a current RA in the lab who's um actually headed to the um ECS program right here in the fall. Um, we've conducted careful comparisons between humans and AI systems on six commonly used tests of graphbased reasoning sourced from across the education, health, visualization, psychology, machine learning communities. All six of these tests were administered in as parallel a manner as possible um to both human participants and several of these so-called multimodal AI systems um and that had been the ones that made it into this benchmark had been claimed displaying competence on other kinds of visually grounded reasoning tasks. We then recorded not only the overall score achieved by humans and these models, but the full set of error patterns they produced, which allowed us to assess even when a model or a person got a question wrong to see if they're getting things wrong in similar ways. What did we find? Um, here I'm going to show you for each of the six tests we included, they have funny names like GGR, Vlat, Calvi, Hul, Hul Multi. actually remade those. I that the bad name there is my my fault. Um and also chart QA, a subset from chart QA. Um we we we we recorded it how well we we computed how well each of the models uh so these are the ones that are showing up on um X ticks in blue, orange, purple, and red. Um how well they did compared to how well humans did. Um that will show up in green. Um and uh these were US adults who had taken at least one high school math class. Okay, first I want to give you a sense of how well um these people did. That's our that's our reference point here. This is how well all the models did. So we have um in this study uh two variants of blip 2 flaunt t5 um three variants of lava based models um specialized system such as matcha and its base model pix destruct um and finally a a closed um proprietary model um GPT4. Across all these assessments we did see a meaningful gap between models and humans both under a more lenient and a strict a more strict grading protocol. Um this is a gap that we might have missed had we relied exclusively on the chart understanding benchmarks that are currently most popular in the machine learning literature. So this is chart this is the reason why we included chart QA which is the the rightmost facet on this slide. Okay, where the gap seems a lot smaller. um Calvie um which is the third one is really interesting because these are adversarially designed plots that have like funny y-axis like limits that require you to really attend closely. So like that's another that's one where we also see somewhat larger gaps. Um we also analyze um their full error patterns. So which can be really telling when when no one is quite at ceiling. Um, again, there's only one way to get all the questions right, but there are lots of ways that you can be wrong and wrong reliably. So, we found that even though GPT4 might look to be uh approaching human level performance, none of these models, GP4V included, um, generated human-like error patterns. So, this is shown by all the dots here, um, falling well below the green shaded area which represents the human noise ceiling. So the upshot is that while currently developing BLMs remain exciting and promising test beds for developing and parameterizing the hypothesis space of possible cognitive models of visualization understanding. There still are these systematic behavioral gaps that are worth interrogating further to realize those models full potential. In parallel, we've also been developing experimental paradigms to probe a related facet of visual visualization understanding. The ability to select design, select the appropriate plot to address um your epistemic goal. Okay. So, I'm going to talk to you about that next. Um the way we set up the problem is to imagine that there's some question that you have about a data set, right? some what I've been calling epistemic goal that uh you know a person is trying to satisfy like for example like which group is better okay let's say the left agent is trying to pick the plot to help shift the person's beliefs appropriately okay but if they had a different question in mind maybe they might need a different plot okay that's the intuition so this is a line of work that was launched by Holly Huey um and we formulated in a study hundreds of different questions that could in principle be answered by using real publicly available data sets. In this case, your data sets that ship with base R. Um so here's an example of a scary one. Um tracking airplanes and bird strikes. Okay. What is the average speed of aircrafts flying in overcast skies that encounter bird strikes at 0 to 5k miles? We then presented participants with a menu. Can I ask you? Sure. What is 5k altitude? Yeah. Sure. It's not 5,000. Sorry. If I Hold on. Hold on. That might be a That may be a typo on my part. I don't think the question inherited that. Anyways, don't worry about it. I'm not I'm I'm I'm starting to, but I'm I'm going to resist that urge right now. No, no, I think that Yeah. So So, but these a lot this is a standard they're full of all sorts of typos. This is a templated This is a templated thing. I think there are basically a lot of these questions are kind of funny and I think they could have been smoothed over a bit more. So, this is like the Yeah. Yeah. Yeah. Yeah. No, no, it's good. It's good. So, um so we had all these questions. Some of them were like better put than others. Um but we presented participants with a menu of possible graphs that they might show someone else in order to help them answer that question, however they interpreted that question. Okay? whatever it is, those are the strings that we're showing to people. Okay? Um, and they could choose from this menu either a bar plot, line plot, or scatter plot. Some of them were more distilled, some of them were more disagregated. Okay? And then we measured how often they picked each one in order to construct a choice distribution over plots. Okay? So, here's what it looked like um on average for the retrieve value questions in our stimulus set. This is like one kind of task and there are like other ones that'll like ask you to like compute some kind of difference between two values. This is the retrieve value subset of items. We then tested various um hypothetical strategies that people might have used to pick the plots that they did. Clearly they are showing some kind of bias here. Um it's a black curve. The purple curves here um represent the predictions of like bar plot line plot or scatter plot purists. Okay. they just always went for the bar plot but just like didn't care whether they had, you know, three variables plotted or more. Um, we also considered the possibility that people might prefer more um targeted visualizations overall but not really care what type of graph they were um sharing. Um, maybe they might prefer ones that show more of the data, maybe ones that hide, you know, less of the variation in the data. Um the best candidate we found that we considered was the proposal that people are maybe actually sensitive um to the features of those plots that were relevant for answering the question and in fact actually predicted the performance of about 1,700 other participants who tried answering every one of those questions however weirdly put when paired with every possible plot. Okay, so we like went and actually like measured performance and then we constructed the distribution based on those performance levels and then gener used that task in order use the data from that task in order to generate predictions of the audience sensitive hypothesis. And this is where we um this is how the shape of those um curves look when we're only considering the retrieve value items. But if you look at the data set as a whole um we found that um this audience sensitive proposal um did well across the board. So what I'm showing here is essentially a model fit measure defined over the divergence between the predicted and the actual human choice distributions. Um has a name the Jensen Shannon divergence like an average of kale divergence but both ways. So I'm excited about that result. um as a initial validation of a strategy for measuring visualization understanding using more open-ended tasks. Um and also because it suggests that um even non non-experts, you could say people who might not be professional scientists are sensitive to those features of plots that make some suitable for answering some questions and others. And honestly, as someone who teaches intra stats for a living, gives me hope. In the final final leg of our journey, we're going to take another critical look at this problem of measurement. Okay. Um so a few minutes ago, I showed you some results um when using these six tests of data visualization understanding. Um these are the ones that we have today, which is why we use them in our benchmark study. Um our question um in this uh last um study um in in work that's actually now in press um and and led by this extraordinary postto in our lab Eric Brockbank um also with Ara Verma we're asking what are um these tests actually measuring um and are they measuring the skills in the best possible way and can we do better okay so here here are some initial steps towards answering those questions um um to get some traction ction. We started with these two GGR and VAT. These being some of the most widely used and established tests. We gave GGR and VAT um as a composite test to a large and diverse sample of US adults. Actually, two samples. One that was recruited um on uh the UCSD campus and another that was recruited over Prolific under the constraint that it had to be a demographically representative sample. Um, and what I'm showing on the left is that we get uh pretty convergent estimates of the difficulty of individual items in both the college campus sample and the US representative sample, which I think is reassuring. On the right, what I'm showing is that people who did well on one test often did well on the other, suggesting that maybe the two tests are measuring some of the same things or similar things. And the question is like what what are those things? One possibility is that those two tests track um how much easier some plots are to understand than others. Uh if so, those plots should be reliably hard or easy across the board. Like maybe bar plots are kind of easy, but stack area charts are like more challenging. Um it's seemed to us that clearly there's a a lot more going on here. Um performance wasn't consistent um for a given kind wasn't always consistent for a given kind of plot. um within or across tests. And there weren't even enough items to be able to establish um a direct link between the type of graph and performance because actually in each of these tests there's generally one instance of each class of graph. So that also made it challenging. Um, we also dug into the patterns of mistakes that people made and found that the best way to predict those patterns on these two tests wasn't the kind of plot or even the type of question, right? Maybe um find the max or identify clusters or characterize distribution or retrieve value, right? These are all common ways of describing the ontology of tasks involving plots. But it really seemed from this analysis that um other underlying factors um that aren't well described by the ontology um are really accounting for error patterns much more efficiently. So here I'm showing you how much better a parsimonious four factor model does than one that uses the groupings that you might think of um to uh use to organize the set of skills needed to understand any graph. Okay, so I'm not going to unpack everything on the slide, but this is just meant to illustrate that there seems to be something that is going on here that isn't obviously mapappable to the ways that we talk about what the component skills are and that you might actually see in textbooks or in um instructional materials when it comes to how to break into data visualizations. And um even if existing assessments might not be um testing and characterizing visualization understanding in the best possible way, we're really trying to take that as a glass half full call to action um to develop improved measures. So stay tuned for those. um you know more generally um the reason I I think this work establishing the perceptual and cognitive foundations of data visualization is so important is because it'll give us a chance to use um what we learn to eventually help people um learners um in real educational settings calibrate their understanding of a complicated and changing world that we can ever only observe a part of. And these kind of um integrative efforts, if you will, that connect fundamental science um to that wider world that we all inhabit exemplify where we're going with all of this. Um we really want to develop psychological theories that explain how people use the suite of cognitive technologies that we've inherited and continue to innovate on. Um we want to understand why that toolkit looks the way it does, what future cognitive tools might work even better. Um, in the long run, I I I I think that understanding how these tools work and how to make them better like like really matters. Um, because it's like these tools that are at the heart of two of our most impactful and generative activities. Um first education which is the institution and maybe more importantly the expectation that every generation of human learners should be able to stand on the shoulders of the last and see further and design the suite of activities and habits of mind that help people continually reimagine how the world could be better and then go out and make it true. So with that, I want to thank all of the folks who've been involved in this work um and other lines of work in the lab. It would not have been possible without an amazing research team and network of collaborators um and colleagues in many different places including here. So I want to thank all of them all of you for your attention and I'm happy to take questions if we have time which you might not. Okay, great. Thank [Applause] you. I see hand hand hand okay thank you so much for the talk um one thing I'm curious about is so again a lot of your examples specifically relating to depictions there seem to be this kind of one-dimensional axis between things that are like more detailed and faithful versus more sparse but I'm kind of curious about cases where people watching diverge and and produce things that are like actually false or like counterfactual I'm thinking like if I'm drawing a glass of water and I want to indicate that it's full I'll like shade it in blue even though like in real life glasses aren't like blue when they water them. And I feel like that depends a lot on culture and language and stuff like that. Like I, you know, speak a language where water is described as blue. I like have been around pools which are usually painted blue on the inside and so on. And that could sort of depend. So I'm just wondering how you thought about some of those those cases. Yeah. Yeah. Yeah, there's a there's a there's a large literature in philosophy in like the area of aesthetics that I take inspiration from um that begins from a similar premise which is like how is it possible that we understand pictures that are false, right? Like pictures of like fictional individuals or unicorns or like glasses with like blue water even if it doesn't look that way. Um the tack that um we've been taking in our own empirical work is to not begin from that premise, right? Which is that rather than thinking of them as false, this is the actual data that people are generating under some objective and we just we're trying to figure out what that is, right? And so the reasons why they might render the water depicted in that depicted glass is for some reason and we're interested in those stakes that you identified. to what degree it captures the faithfully the visual appearance the phenomenology of looking at a glass of water in a in a room in where the glass is in front of you as opposed to um a kind of acquired convention for how you depict water for example I think those are like very reasonable sources of like constraints on why people make those representational decisions so I think those are like the kind of questions it's just rather than thinking of some drawings as good or bad false or true. We just find it a lot more useful and productive to think of like okay these are the drawings people make under those conditions where why why do they look that way and not another if that helps. Thanks for the question. Sure. Um thank you very much. Uh I am curious about so I I thought something you only very briefly touched on that is super potentially interesting is the humanness of errors that uh that artificial systems make in uh graph and visualization and I guess in a way actually that seems especially useful for an artificial system because if I'm producing a visualization I know what people will understand it to if they understood it correctly. But I don't know I I might have a lot of difficulty imagining how it would be misunderstood. And so that actually seems like like a task which was literally like how else might might this miss my be misunderstood understood would be very very valuable uh thing and I want to ask like what's what is our state-of-the-art model of that of like how graphs are misunderstood visual misunderstood and is it an interpretable state-of-the-art model from a scientific oh gosh give us insights into yeah workings of uh the actual inner work okay visualization it's really it's really it's really really good question so I'm gonna engage with it which is um so there's a phenomenon which you know some of us in the room study which is like perspective taking um which is hard but doable in some contexts it can be possible to imagine what the world must look like from the viewpoint of someone else and there are various paradigms for studying that it also seems like the capacity for doing that very quickly and accurately can change with practice. And so there's like a role for expertise and experience to play in like shaping your ability to do that. One way in which that manifests I'm not going to refer to like the VLM results at all here but just like to sketch the like shape of the problem. It's like teachers like the kind of expert one of many kinds of expertise that teachers really skilled teachers develop is the ability to diagnose notice it is called misconceptions when they listen to learners describe how they're thinking about a problem. Right? So it may be that the final response or answer that a student gives to a math problem is like wrong but it's not only that it's wrong like the the relevant question isn't always that it's wrong or right but rather what is the nature of the misconception or misperception or what is what is the gap between um the normatively correct way of representing the problem and proceeding through the different steps and whatever the student did. Okay. So, one way in which I think it okay, maybe I will just very very briefly if it's okay um speak to how we've been thinking about this question in the modern era of extremely large um machine learning systems is to interrogate the kind of um operations and like conceptual primitives so to speak that these systems rely on when they get an answer right or wrong. using the suite of like tools that go under the banner of like mechanistic interpretability, but you could we could we could we could call cognitive systems neuroscience for artificial neural networks in order to diagnose where right answers and wrong answers come from. And it'd be really really nice. I'm not I'm not going to go into those in detail because of the but but but but like but like that's a kind of like strategy you might take to connect what is the the operations but they're being performed in these in these systems that do not immediately lend themselves to those kind of interpretations um at the same time and then connect those to what has been called like knowledge knowledge knowledge graphs the underlying knowledge graph that a person might hold or not hold. Okay. So like that's like a a like a way of setting up the problem I think and like sometimes the issue the bottleneck might be perceptual as such and other times it might be like a bottleneck or the gap might have to do with like a reasoning step and the goal is to really expose what those steps are have tools for diagnosing them so that in principle you could use them to diagnose those misconceptions or misperceptions mistakes um missteps um in anybody or any system that's very like very very hard of course um but these like challenging problems are like a I think like a a tool that we can use in in like towards that end. Okay. I hope some of that made sense. I I like a lot of thoughts. Can I ask a follow relation to this? Yeah. Like so I think it makes so in particular you helped us out here by highlighting that red like the right part of the graph to pay attention to right and one of the reasons why you might have trouble answering this question. I'm just trying to see if I understand is that you might not know which part of the thing to pay attention to to answer the question. Yep. Um or but or you might know what part to pay attention to but you might not know what to do with it. How do you take that red rectangle and kind of relate it to the information the the information on the y ais or you might I know even if you know how to do that turn that into the answer of the question are those the kind of that's kind of well but yeah like those are the kind of teaching moves that like a human educator might actually take you could yeah people have even tried to break this down like some of Eric Schultz's group work for example where they say like you could ask the model or people like how big is the rectang red rectangle in the second column something that doesn't require any reasoning but just or something or like which blue rectangle is the high like for example is that the kind of thing you're doing yeah yeah yeah so we started with stimula like these so this is a very very new work that's being led by yes exactly so um this is being led by Alexa Tardaglei um in collaboration with Chris Pototts and the the we started with stimula like these like the the real wild uh plots and questions and then uh realized we wanted to drill down. So they're like both taking these plots and asking a variety of different questions including these more basic ones so to speak. Also, we've realized that like really stripping these down to much simpler versions of it that still pos like still contain the like core like multimodal integration challenge like even like really currently like running these studies on like number line stimula essentially to like understand how you might judge the distance between two data points on the number line. Yeah. for example, but like there are like all these different routes to it. Um because this is like this the the point is that this is complicated and so like decomposing it is like part of the right could be like how you teach these skills break that into exact assessment human. That's right. That's right. Yeah. Right. So there's assessment and then also like the reason why this black like shaded um you know charcoal rectangle is there is because like there's some guess that I had about what might be a bottleneck to rapidly looking at the appropriate feature of the plot. Okay. Could I follow up on that? Yeah, sure. Okay. Yeah. Um, when I first looked at this uh plot, I thought it was actually ambiguous and I wasn't sure whether uh the the height of the bars all started at zero and we're seeing one of them in front of another or you know starting with the cheapest one like dodge equals true. Yeah. Um but that's not my question. My question is this. Um uh you've been focusing on for adults who've been exposed to like lots of different kinds of media. what are what are the best ways to pres present information so that so that people can get it and get it efficiently so forth. But there's another question which is I think for when we first see a new kind of graph like this might actually be the first time I ever saw one that's stacked up. At least it's been a while since stacks up like this. That's what they look like. It's not so obvious when you first see it. Yeah. As we get very used to seeing them, seeing things that are different can become really harder. Right. So for example, event related potentials where negative goes up drives me crazy. I see it. Yeah. Right. Uh, so I wonder I mean what do you think about the question of how these kinds of representations should be structured so that they're easy for kids to learn? Yeah. Right. That you start out with none of these representations. What would you put into the earliest ones that you want kids to learn about? How would you progress from there? Yeah, that's a ah that's a really good question. Okay. Yes. And and actually there are like version like um I feel like um there's a few minutes ago Nancy and I were talking about something kind of related when it comes to like how you know people break into this class of visual inputs at all. Right. There's a sense in which you're building on top of a lot of like neurotypical ordinary visual and cognitive development in order to uh grasp the presence of different shapes in certain spatial arrangements. There's it relies on literacy as such being able to you have there are a bunch of primitive there a bunch of like conceptual primitives and like more basic competencies that you might need to build up first. It might be that there is like a sequence of experiences that build from the like like that's a really good question. I don't have answers to that curriculum design question but I think it's a great question and it feels like there are thoughts that I feel Just to build on that, it seems like you know the whole back third of the cortex does vision and it's this spectacular set of machinery that extracts all these different rich kinds of visual representations, you know, from orientations to heights to shapes to landscapes, all of which are possible spaces we can use, right? So it seems like the essence of making a good graph is figuring out how to make a visualization that you know taps into some of that machinery machine like what makes visual scenes in general easier and more fluent to process right things that aren't particularly cluttered right if you're interested in like the kind of problem of like visual search over those scenes there aspects of those processes are re like co-opted in order to identify the appropriate sub region of this image to look at right you might imagine like designing those initial expo like those initial graphs to cohhere more with those kinds of scenes, right? And then and then there's like this step of like mapping the components of these scenes to concepts that you also need to learn about. And that can be really hard initially and then become faster and faster and easier and easier over time. And I think it's really fascinating what is happening as that takes place. Yes. Um and also Yes. But then also, what should I do? How um this Why don't we This is really cool. Why don't we wrap up and allow people to who need to leave to go get things to do that? Okay. Um and but other people, please uh linger here. You'll be here for a little while. We have reception. Uh we have 45 minutes or so for people to interrogate you. So, thank you. Amazing. Thank you. Thank you. Thank you.