[Music] and now we move on to the second presentation by tim behrens from oxford on the learning volatility and the anterior cingulate cortex i should say as a matter of introduction that tim bergens has multiple talents the oxford group in neuroimaging is very well known for the development of beautiful tools and methodological advances especially in domain of fiber tracking in the human brain so a very strong methodological component of human brain imaging but tim is also very interested in the development of models of decision making and evaluation and i think this is what he's going to describe to us today so tim berens thanks very much indeed stanislaus uh for inviting me um and for the other organizers um it's it's an amazing place a real pleasure to come along um it's bank holiday in the uk so i've given up a day's holiday to come over to paris so um so i'm uh so i'm going to speak today about um learning uh volatility and the acc i don't know quite what the audience is like but for a neuroscience audience the uh the letters acc have some significance for other people here it means the anterior cingulate cortex which is a particular structure in the brain uh but i'm gonna i'm gonna try and talk about this in quite a specific way what i'm gonna try and do is i'm going to take a couple of uh results that have come from making lesions to the brains of macaque monkeys and these two results at first appear to be really rather different from one another in rather different fields of neuroscience and i'm going to try and explain them with some computational and economic ideas and with some some data from human brain imaging experiments so the first thing to do i think is just to take you through these results uh these first two results um so this is uh this is a study that came from our lab in the in the field of reinforcement guided decision making so all the way through here i'm going to be talking about not about the process of making a decision like hilka was talking about but rather about the process of learning from the outcomes of your decision so that you can make a better decision in the future so this uh this is in the field of reinforcement guided decision making and these are this is the behavior of some monkeys and all it shows is uh the influence of your previous outcomes on your current decision so this is the very last outcome this is the outcome just before that so you can see that uh the very last outcome that you just experienced has a great influence over what you're about to do next the one before that has rather less influence but still a significant influence and you can see this curve of integration as integration curve such that the monkeys are in essence integrating over a number of previous experiences in order to make the best choice at the current trial and then we would make a lesion so we would remove part of the brain and in this case we're moving the so-called portion of the anterior cingulate cortex i'll tell you a little bit about that later so don't worry if that's not familiar to you but when we make this lesion um and then we ask the monkeys to do the tests again but we can see a dramatic effect so now uh the only uh the only data point uh the only trial that has any significant influence over the monkey's current behavior is the very last trial that it recently experiences and this beautiful integration curve just completely collapses after allegiance to the anterior singular sulcus so that's the first experiment that's the first result i'm going to try to explain the second result i'm going to try and explain is really appears to be a completely different a result of a completely different field of neuroscience it appears to be in the field of social decision making uh and social valuation so this is a a result that um this is based on the fact that monkeys will sacrifice food will pay to do various things and one thing they'll pay to do is to look at other monkeys particularly if the monkeys are higher up in the social order so this is this monkey here has the opportunity to get a piece of food but just behind the camera there was a still picture of this monkey and so the question is will the monkey take the food or not and so if you run it you can see the monkey is is not prepared to hit the food he's occasionally if if anything i mean it's very difficult to interpret from his behavior what he's doing but it looks as though he might be trying to occasionally snatch glances at the other monkey as if as if he's trying to perhaps get some information to get some social information about this other monkey and there's various pieces there's various piece of evidence from this experiment which suggests that's the case which i'll tell you about later um but but in the meantime i'm just going to let you know that a very very s that a very similar lesion but to a slightly different part of the brain called the gyral portion of the anterior cingulate cortex which is just next to the circle porcelain if you make if we make that lesion this time then we can get behavior like this so here the monkey takes the food happily takes the food and then even after he's eaten the food shows no interest at all in the other monkey in the social in the in the social stimulus that's available okay so those those two results appear to be extremely different but i'm going to try and persuade you for the course of a number of ideas and experiments that these two results are essentially the same thing but in two different domains i'm going to try to persuade you that they're all about how much you value a piece of information so i'm just i just said i'm just going to tell you a little bit about the anterior thing that called it so i know that not everybody here will know about it uh so this is the um anterior cingulate cortex and a macaque monkey um these this red area here is the anterior cingulate um sulcus which caused the reward effect and this blue area here is the location of the lesion for the anterior cingulate gyrus which caused the social effect so often if we're in neuroscience if we want to understand the role of an area we do so by looking at its function but also we look at the connections because the connections are really important in constraining what function the brain region can do because they tell they tell it what information it has to process and what other areas it can influence by its connections but like a bit like if you were in a in a if you were looking at a computer circuit you wouldn't just look at the processor you would look at what other regions it has access to and what other regions it can influence so here i'm just going to tell you a little bit about the connections of the anterior cingulate cortex um this i'm going to tell you particularly about differences between the sulcus and the gyrus because they're relevant here so the anterior cingulate cortex has various connections the sulcus has various connections that the gyrus just doesn't have and these regions are mainly to regions that know about the motor output of the brain know about your own actions so so so these regions here the primary cortex the parietal motor areas the spinal cord these these areas know about what actions you have taken so and if you remember lesions to this region affected the influence of reward on your own actions and the the gyrus by contrast has um has connect many connections that the circus does not have and these connections predominantly are to regions that uh deal with similarly that might be emotional or social in nature so these all these regions here have been have been hypothesized to to to code information social information or information about particular expected pain or emotions um so these this uh so these this region has access that the circus does not have to information about other agents and that and that's uh that's part of the reason we hypothesized that this lesion would have an effect on the social evaluation tasks that i showed you and it's um it's nice that it's therefore understandable that it would have this effect but there are also some connections that are really common between the two um which is why we have some hypotheses about their role being rather similar it's not just because they're next to each other um firstly um so so here so so firstly they're both connected very strongly to regions uh that process uh reward and motivation for example so these are some of the uh regions that hilku was talking about and um and they're very involved in processing reward and motivation um but these but themselves uh they share one of the densest connections that you can find anywhere in the brain um between the gyrus and the gyro and the sulfur portion of the anterior cingulate cortex so reward and valuation and information from reward is going to affect um signals in in both of these regions okay so that's the um that's the end of my intro about the anterior singlet sulcus i'm just going to go back to this first study i'm now going to try and explain this data computationally and with some functional mri data so just remind you what happens when you remove the anterior cingulate sulcus from a macaque monkey is he's no longer able to integrate information over a number of trials in order to make the best decisions so in order to understand uh in order to understand what this might be we figured the first thing to up to know is what might determine this integration length so here's that plot again without the error bars just to make it clear and you can see that our monkeys in order to make their next decision only need information from perhaps the last four or five trials but at the same time there was a very similar time there was an experiment done in stanford by a group including bill newsom leo sugru and greg corrado and they had rather a different experience they found their american monkeys instead of being able to make a decision on the basis of four trials information they needed nearer 35 or 45 trials for these american monkeys to get enough information to make a decision now i can't tell you how proud we were of our monkeys when we found out that our monkeys could learn in just four trials but the american monkeys needed 45 trials and we were thinking there might be something unique to the educational environment in oxford which led to fast learning but then we thought there might be some more ecological and evolutionary reasons reasons to describe this difference and we looked at differences between the experiments and the crucial the crucial difference is that uh the stanford monkeys were operating in an environment which would stable so the probability of reward on a particular option might last for uh might be the same for a great number for a lot for a long number of trials such that information from 50 trials back is still relevant and important because it still carries some uh it still might influence what happens at the next trial whereas our monkeys uh their all probabilities changed very very fast indeed approximately every 25 trials if our monkeys had had such a long integration curve they would have been acting in a very sub-optimal way because information from 30 trials ago because of the experimental design uh was born no relevance on what might happen at the at the forthcoming trial so they really should not use that information and it's great because they do not so in some sense we wondered whether these monkeys were able to adaptively process the statistics of the reward environment they were alert they were working with and to um so i processed the volatility which is a sort of statistic of the reward environment and then to set their rate of learning their rate of integration of reward the same or appropriately optimally so we designed an experiment to test this oh no i'm just going to tell you something a little bit about how you might do that first so in order to know how you might do that and in order to understand some of the rest of the data you need to understand something which has become a very influential theory in um over the last 100 years in psychology and recently in brain imaging as well and this is reinforcement learning and uh and so this uh this gives us some ideas about how one might change that integration length and how one might update your um your uh your your preferences or update your valuations in the um in the context of receiving some new information and it's really very simple indeed or at least uh the part of it i'm going to rely on is really very simple indeed it just assumes that you have some that you have before you make a a decision like hilku is saying you have some prediction of the value that you're going to get out of this decision and then you receive an outcome and your job is to update this prediction such that next time you make this decision the prediction is rather better because then you'll make a better decision next time around that's what we call learning and so in order to do so what what we think happens is you people is we think that people compute a prediction error which is the difference between the outcome and the prediction and you make some new prediction which is just the old prediction plus plus some fraction of the prediction error and this really rather simple algorithm explains a great deal of neural data so here's the equation again and this prediction error has been a very influential thing it tells you how much information is available from the event that you're currently witnessing and then this learning rate doesn't tell you anything about the information you're currently witnessing but it tells you how much you should weight this particular piece of information in contrast to information you've already received because this here this value here the the value of the of the the previous predicted value contains all the information you've previously learned about and then this tells you the weight of the current trial so what you should do is you should set the weight of the current trial to the statistical value of the information so if you're working in a very fast environment new information is much more valuable to you than old information but if you're working in a very stable environment all the information is much more valuable to you so you should set this learning rate in order to in order to maximize the the use of the information that you receive and i'll just uh just uh give you just show you that there's a monotonic one-to-one relationship between this learning rate and the integration curves that i've been showing you before so this so if i if i plot these integration curves with a very slow learning rate i get a very long integration curve i can i can plot it instead with a very fast learning rate and i get a very fast um a very sharp integration curve much like my monkeys and you can choose one in the middle uh choose one in the middle and you can get one much like this the stanford monkeys so by fiddling simply with this one parameter the monkeys might be when they update their act their their beliefs the monkeys might be able to change those integration curves that we were talking about before okay so i did so now on to next to an experiment um this is a very simple it's a very simple experiment where humans now just simply have to choose between a blue and a green square there were different values there were different monetary values associated with these but don't worry about that for now essentially the task here is to learn the probability that the reward that either blue or green will be rewarded and you only one of them will be rewarded so you have to learn a probability you have to learn that probability that it's blue rather than green and and so what we did is so this is the probability that the subjects had to learn this is the probability that that the outcome will be blue and if it's if they choose blue and the outcome is blue then they'll be rewarded so you can see here the trick that we've done we have a phase in the experiment when this probability is stable it doesn't change at all and we have a phase when this volatility is changing fast is volatile so you can see so so here just by observing the outcomes subjects are going to have to try and learn this learn this probability but subjects really but subjects could never learn exactly this probability of course because what they can learn depends not on the underlying probability but on the observations that they can make the outcomes of particular trials so we had to build an optimal learner a bayesian learner which told us how much one could possibly learn from the observations that use that that the subjects receive and the outcome of that bayesian learner looks a little bit like this and there are several things to notice about this but the key ones are that by the end of the stable phase each new observation is causing very very little uh impact on the estimation of the of the probability so that means that the learning rate the amount that the newest piece of information is contributing is very very small indeed by contrast at the um so by contrast uh in the volatile phase a new piece of information can have a massive impact on what you should think so if you understand that the world is volatile then you should believe that a new piece of information should change your beliefs an awful lot i'm just gonna so we went um so we went and measured this behaviorally and so we measured subject learning rates in the stable phase and then again in the volatile phase and you can find a significant difference between the two so much as we thought from the monkeys humans are able online to change their learning rate based on the statistics uh the volatility of the of the current environment and not only can they do it but they can do it pretty well these are the these black dots here are the behavior of the optimal bayesian learner so um so so humans are meta-learning they're sort of hot they're hierarchically learning how fast to learn and then we can flip the experiment and we can perform the volatile phase first and the and so again it's not a block ordering effect so then we can move to imaging and uh much like in many experiments changes in our our predicted reward uh happen throughout the course of the um of the experiment but as i've just explained it's also true that the volatility of the environment and therefore how much you should learn from each new piece of information that also changes throughout the course of the experiment so um so when so we run we run this experiment and we we can first first we can look up at some rather simple contrast which include uh what parts of the brain active just when you're making a decision and you can see huge amounts of the brain active just when you're making the decision and then you can look at what parts of the brain are active when you're monitoring the outcome of a decision so you can see here um this here is a cingulate cortex so so you can see again huge parts of the brain including the single cortex but slightly different parts of the cingulate cortex are active when you're looking at a reward versus when you're making a decision and then we can we can do the crucial contrast which looks at at the time of the outcome of a trial which brain area has activity which parametrically fluctuates with the volatility of the environment i.e which parts of the which part of the brain tells us how much we should learn from this current piece of information and when we do that we find activity in the sulcal region of the anterior cingulate cortex exactly the region that we lesioned in those macaque monkeys and we're when we're interested to know that this region of the anterior anterior cingulate sulcus lies exactly at the border between the regions that are interested in monitoring outcomes and regions that are interested in making future decisions which is appropriate since the um the role of this is to learn the impact of an out of a current outcome on a future decision so here so here this reason uh we propose this region is involved in telling uh in telling various other systems how much to learn from a current outcome and i'm just going to show you a couple of props to try and convince you it's a learning signal the first is just a plot of its effect through time in a trial there are a number of things that the anterior single cortex does uh that are involved in processing a decision making a decision and you can see that this is this really is not one of them at all there's no effect here during the time when the subject is making a decision um there's no effect after they've made a response before before they see an outcome the moment they see an outcome is a massive response which is proportional to the volatility of the environment to the amount they should learn the second thing is not only do we have variability within an experiment about how much subject should learn we also have variability between subjects so some subjects in general are very fast learners and other subjects in general are rather slower learners and we can perform a correlation in this region across subjects um between this the bold signal at the outcome time and uh and their overall learning rate and it turns out that the faster you are to learn in general from your outcomes the hot the the higher your bold signal in this anterior singular region will be at the time of the outcome again okay so that's the end of part one that's is the end at the end of the part where i've tried to describe um the circle effect uh so i'm just going to go back and remind you of the gyral effect so atc general animals will ignore the presence of a social stimulus when offered some food and even after so they won't look at a social stimulus for example an attempt to to get value get social information sorry um and uh just another uh quick slide on that result um it's interesting to note that that the so i'm just what i'm putting here is the amount of time the monkeys will wait before taking that food reward and put on on the bottom i'm potting various different monkeys that are shown to them in the background and you can essentially you can see that a staring dominant monkey makes the most longtime uh female monkey perinea which is uh the the scientific name for genital organ of a female monkey uh makes them wait a long time and then i but as so as you as you go down it seems like the less the the more valuable socially valuable the uh the the um image is the longer they will wait but if we make a gyro allusion to the anterior singlet jairus you can see that the effect completely disappears so not only on average do they wait less time but there's now no influence at all but on how um there's no influence at all on how socially useful how valuable the information will be okay so now i'm going to run a very similar experiment from the last one with one simple trick and that simple trick is before making a decision between a blue or green square the subjects are going to receive a piece of advice from a a a confederate sitting outside the scanner so here um the person is going to say choose the blue option but the confederate setting out of the scanner by via various manipulations uh might have very different motives from the person in the scanner he may want the person to win he may want the person to lose he may want the person to do not quite as well so so the person in the scanner now has two parallel learning problems the first learning problem he has is he has to work out the probability that the correct color is blue exactly as he had before this is a simple stimulus reward um similar action outcome um learning problem and then second the second learning problem that he has at the same time is to learn the probability that the confederate the person sitting outside the scanner will give him good advice is on his side so he's learning the current motives of the confederate you can see these you can see that obviously when we put when we do these experiments we don't actually uh let somebody sitting out the experiment dictate the experiment we control it uh with a computer but um but the sort of insiders kind of believes they're playing against a real human being um so we can control uh the confederates um lies so this is the probability of a true of true advice and then so we can use this other trick so again just like before we can control the volatility or the of the or the value of information about the action that you make and we can control the volatility of the social information and we can ensure that these two signals are contain no information about one another they decorate from one another so then when when you receive a single outcome at the same time you receive two pieces of information one about the confederate and one about the reward outcome the action outcome contingency but um but crucially at some points in the experiment uh both um both will be very informative at other points in the experiments uh there'll be little information about the reward but lots of information about the social situation uh and etc etc so that we so that we could see independent fluctuations in the bold signal uh between uh between regions coding these two different parameters so the first thing to do then is to assess whether it's even possible that social information is processed in a similar way to reward information so the the the um so reward the reward information uh has been addressed in many uh single unit recording studies many fmi studies using this reinforcement learning is very simple learning algorithm but it might be thought that social information may be processed by a much more complicated um uh process which does a lot of mentalizing imagining what other people would do so in order to in order for um statements about the volatility and the learning rate to be valid we need to assess whether our subjects are performing in a um in a room are integrating actions about other individuals in a in a reinforcement learning style way and so this is this is a behavioral first of behavioral finding so this is um we've modeled their behavior based on just the magnitude difference in reward a reinforcement learning algorithm based on the outcomes a reinforcement learning algorithm based on the confederate advice and then two uh um two other possible strategies that the subjects might have made for learning about the confederate advice so this is just following what the confederate says all the time and then this one is using a tit-for-tat strategy so um so learning so so assuming that the subject will lie if he lied last time or will tell the truth if he told the truth last time so although we can't be sure that they're using exactly reinforcement learning approaches from this uh from this plot we can be sure at least they're integrating information across trials in the way that reinforcement learning would suggest because otherwise this tip for tap policy would do much better than the reinforcement learning policy at predicting at predicting behavior and we have another piece of neural evidence so this is this is a very very common finding in reward learning paradigms which happens in the in this this ventral stratum and also in some in dopaminergic areas such as the vta um so what happens is when you when you make a decision you get an expectation about uh about how um about how much you will receive from this trial and then when you get an outcome you get a positive effect of the reward that you'll see and you get a negative effect of the expectation so you have a reward minus the expectation that's exactly that prediction error signal that um that you previously saw in in those mathematical slides and this is a very very common finding throughout reinforcement learning research so the question is would we find such a similar signal in us in the social domain which would really convince us that something uh like temporal difference learning like um uh reinforcement learning was happening in uh in this for the social um paradigm and again we can we can look for that using brain imaging and we find and we find some regions that are very commonly activated in social tasks in tasks involving theory of mind but in in this case in this experiment they're doing something really rather simple they're coding for again the the prediction that they're going to get before the outcome and then after the outcome they get a negative effect of the prediction and a positive effect of the event of a lie so here again we've got we've got a classic prediction error on whether somebody else will lie to me i'm expecting him to lie to me and then he tells the truth i get a negative signal he lies i get a positive signal so these two pieces of evidence we thought were good evidence that uh that the subjects were performing this um this social task in a very simple reinforcement learning type way so then we thought we the interest the most the interesting thing is to go and look at the singular cortex and see if we see similar signals uh for the social um in value of social information as we do for the value of uh reward information or outcome information and so we go we can so we have these uh different signals as i was explaining before and we can look at the value of reward information just like we did before and we can see again and like a signal in the so-called portion of the anterior cingulate cortex and then we and this is very similar to the monkey lesion in the so-called portion of the anterior singlet cortex and then we can do we can do um the corollary in the social domain and we can look for a thing which looks like this the value of social information and we do that when we do that we see a signal in the gyro portion of the anterior cingulate cortex and again this looks really rather similar to the lesion which was in the gyro portion of the anterior cingulate cortex which prevented an animal from paying to look at another animal essentially so this is evidence i think that uh that that not only are these is there an association between between these two brain regions in terms of what domain of of information they process but that despite that dissociation they're in fact processing really very similar things they're processing the same computational parameter which relates to the value of the current piece of information that you're witnessing and its role in learning and so we've got um we've got uh one other piece of evidence to support that actually actually there's a couple of other ones here's the first one so this so i've shown you so far that the the value of information across trials correlates with these things but the next thing that we could ask is does the value of information across subjects predict these things as well so here so obviously if you're playing a game where you're asked either to follow somebody's advice or to rely on your own um experiences there'll be some people uh a bit like george bush that'll just do whatever he's told and some people like dick cheney who'll uh never listen to anybody else and so if you um uh so so if so you can use that variance across different people uh to ask do people that always do what they're told have bigger signals in the singular gyrus and do people that that always follow their own advice have bigger signals in the singular sulcus and the answer to both of those questions is yes so the last piece of information looks at a region of the brain that um that heike was telling you about earlier uh so this is a ventral medial part of the single of the um of the prefrontal cortex which often in many studies has been activated in the context of the value of the thing that you're that you're currently choosing so so here again we can look for a signal which relates to the value of the thing that you're currently choosing but we have an interesting uh flip on that because in some people in the in the dick cheneyites this signal will look like this but in the uh in the george bushites this signal will look like this and these signals themselves decorate so we can say this signal how much of it looks like that and how much of it looks like that and uh in so in doing so we can then we can then we can then correlate that with how much information they got from the singular cortex when they were witnessing the that uh information beforehand so this is during making a later decision and then uh but and then we can we can we can say can we predict this brain signal here which tells you how much i'm currently valuing things by how much by how active the anterior cingulate cortex region was at the time when i earlier witnessed that information and so you can see that in the in the anterior cingulate sulcus if i if i if i have a high activity when i witness a piece of information in the anterior cingulate sulcus then later on this single signal will look much more like the dick training signal but if i have a high activity in the anterior cingulate gyrus when i'm witnessing that piece of information then later on this signal this value signal will look much more like a george bush style signal so this the the singular signal or the relative weight of the gyrus and and um so called singular signal predicts uh how you will integrate those information those two pieces of information into an overall value signal which will then hopefully guide behavior okay so that's me done pretty much i'm just going to run over we think that there's something to do with learning in the singular cortex i don't know if you've got hold of that from my 40 minutes and yeah it generally predicts so across a number of uh across a number of different situations it predicts inter-individual variability um we think there's some dissociation between the gyrus and the sulcus uh such that the uh that gyro the gyrus codes for social information and the circus codes for information about your own actions and rewards um but we think there's a striking similarity as to what processing they do in those two different situations um and uh the outputs of these parallel learning signals um are uh processed um in a reason that heike was talking about earlier and these are the key people particularly matthew rushworth who i work with in great detail and [Applause] is that too long i think it's a beautiful lecture beautiful example actually of what uh mathematical theories can contribute to understanding in particular the concept of theory of mind is still a very fuzzy concept but here the notion of social value is actually is given a mathematical definition if i can ask the first question maybe is the consequence of your talk that humans don't differ from reasons macaques in social evaluation that we have a similar system in anterior singular cortex or are there aspects of social evaluation in your experiments that are specific to humans so i think that monkeys do do social evaluation and i think that it's almost certain that they share some anatomical features and some anatomical structures that perform very similar jobs as those features do in in humans and the cingulate cortex is a very good example of a region that appears to be quite well preserved in terms of function between a monkey and a human but some of those other regions that i was telling that i showed you for the social prediction errors some of these other reasons perhaps a lot less is known about the social function of those other regions in uh in monkeys than is known in humans particularly because a lot of the human results from those other regions focus on on on tasks that it's very difficult to know whether a monkey has is doing so things like theory of mind understanding the intentions of another individual and you could design tasks and we've thought about design tasks which try and look at whether a monkey is solving those um those tasks but there are parts of those brain the there are parts of the some of those some of those brain regions have expanded dramatically between the macaque monkey and the human and uh so it's not clear it's certainly i'm certainly not aware of anybody who's done a theory of mind study in a monkey and looked at recordings from the sts or um or the the dorsal media before but it's not clear where the um uh where the homolog of dorsal area 10 and a human would be in a monkey even so so yes and i mean yes and no i i strongly think that there are parts of that that social cognition relies on some structures that are that would not have evolved for social cognition and uh and i think the gyroportion of the anterior single qualities might be one of them but i'm not saying that all social cognition is uh processed in the same way between a monkey and a human johnstone you may have answered this question already but i didn't if so i didn't get it are there are fast learners also fast meta learners oh uh so a fast metal learner is somebody who uh quickly learns that and so you need a much you need a really complicated experiment to ask that question because you have to have the rate of change of volatility changing through time um i haven't done that experiment so i don't know the answer i think it might be given given how hard we how hard we struggled to design an experiment that would learn that would that would uh look at meta-learning i i i would be terrified of the prospect of designing an experiment which looked at meta-meta-learning uh but maybe it's possible i don't know um yeah that it's an interesting question um i'm not sure whether um you convinced me that what an anterior seamless cortex singular cortex gyrus and sockets are doing is really manipulating or controlling the learning rate as opposed to sensing uh the changes in stability in the environment that is it's learning about volatility in you know your language um and hence passing that on to the the standard circle brain circuitry that's involved in learning and we also see implicitly learning rates like in the dopamine neurons uh so i'm not quite sure that i i think there might be some detail in your question i didn't get but so i i'll tell you what i think is happening and then you can tell me why i'm wrong so i i don't think it's computing the learning rate uh partly i certainly get stored in the learning rate partly because it's only active at the time when it sees the outcome so there must be some other regions that are doing some complex computations to either compute it or to store it i think it is processing the influence of the learning rate on the action you're currently seeing i think it's the neurons they're saying this is an important piece of information now you should be learning from this one and there are and but on other trials they're saying don't learn from this one that's what i think it's done so it's basically recording the stability and environment from you know you take a choice um the stability has because they've changed your learning rate so it is learning i don't think it's learning about the learning rate i think it's processing the influence of the learning rate on how much he processed the current trial exactly that's what i think thank you for your talk i have a question about one fact when in fact when you present subject a series of trials with a pure random factor and subject found some spurious regularities which actually doesn't exist so it seems like actually subject doesn't adapt to a very stable environment but create some some artifactual regularities how do would you interpret these findings in your framework i see it's interesting i haven't really thought about it in that context but so i don't i don't know why people always over interpret stability as uh or probabilistic outcomes as patterns it's it's absolutely clear they do and uh when we first designed these experiments we had to go to a huge amount of effort in our instructions to subjects to tell them not to do so uh there aren't we even explicitly told them there weren't any patterns aab a in in the data um so my suspicion is that the finding of such patterns relies on a very different cognitive process from the cognitive approaches that we've been investigating but i don't think i have a better answer than that really i don't i don't know and i can just tell you that we tried as hard as we could to suppress that instinct in every subject that we that we instructed but yeah yeah you you may know angela you and she she in a recent study she showed that actually there might be some like a default volatility [Music] used by subject obviously i see so they think that things are changing but i i see so they're just default integrated yeah i didn't we never went and measured i suppose i'm not sure how you could measure that because by trying to measure it you'd have to impose some volatility but or you could just measure it under stability you could measure that integration with under stability but you'd have to give a very long way i think because in order to understand that the world is a hundred percent stable you have to experience a great deal of trials so i mean i don't know i haven't seen the study but i mean it does sound interesting so it's so they're not so you're saying that they that they leak information even when they shouldn't yeah i mean i think that certainly is possible if i may uh ask a slightly related question um did your experiment provide evidence for social biases or were subject always accurate in their social evaluation was there for instance a collaboration bias so they would initially assume that the person was cooperative rather than like so i think uh so the the the sort of formal answer that question is given here so the this is this is a parameter which just lets them follow the just just follow whatever the collaborator does and it's it's greater than zero but not significantly so but the informal answer is that um is that if if they if in the um schedule and if we um if we allow the if we if we force the sub the confederate to lie on the very first trial then the subjects behave in a much more sensible way than if we don't force them if they're allowed to tell the truth for the first couple of trials because it just sets up in the subject's mind the fact that these this confederate really doesn't have the same motive as you once you believe there might be some shared motive then your this parameter would be higher but if if they know from the very outset that this guy might have the same motive as them then they seem to behave rather optimally like unless there is an urging question i think we're going to break now and we're a little bit late but i think we'll convene again in about 10 minutes at 11 15 if we can quote you