Transcript for:
Causal Hypotheses Part 1

all right on to evaluating causal hypotheses so in the last lecture we basically set up like our account of what causation is for us like what is it for a causal variable to have an effect in an individual if that individual is deterministic or probabilistic and then what is what is it for a causal variable to have an effect on a population and then um since it's difficult actually to sort of determine an individual's like right complete residual state which you kind of need if you want to say for sure whether uh there's an effect or not um we worked up a way of using a population right to determine uh whether there is in fact uh whether a causal variable has an effect in indidual be that deterministic probabilistic uh and so on right so that's all sort of the conceptual Machinery the background so now we're going to get into actually evaluating causal hypothesis this lecture we're going to focus on randomized experimental trials uh and then in the next lecture which I believe it's Wednesday right is it Monday is a holiday uh that will then be on two other uh experimental designs that you can use if you can't do so this one we're talking about today is kind of the gold standard if you can you want to do this kind but often you can't and there are some other designs you can use that are less less good evidence right for a causal hypothesis but often can be used and can can uh so as far as course stuff right I I made an announcement in canvas so hopefully you saw that um which is that the uh homework for chapter 7 being extended a little bit because it seemed like you were had like a deadline like it was like Monday and then another deadline Wednesday so I pushed the Wednesday one um and the other sort of big thing is it's like the same number of chapters right for each exam right four chapters each but somehow like the second set of four is really pretty dense and so I want to count for that sort of somehow make it sort of more manageable and so without sort of radically altering the syllabus the the solution I came to was to change the exam from trying to approximate in class by restricting the amount of time you have to like a few hours to just making it totally take-home and giving you like the whole weekend um to work so it I mean it's always obviously sort of technically open book your remote right so by all means use your book use your notes um but here you're not even really restricted by time it's going to be the same sort of exam I would have given you in class so it's not I'm not like making it harder um but I'm just I think you'll need you you're going to want to like because there's so much information being presented I think you'll want to make use of of your book and of your notes and of the lecture slid and the lecture and I think they'll give you a better shot at at at uh doing well on the exam and you know again the exam the the plan is not to try to trick you with very like precise stuff uh it's the point is to see that you understand the important concept um so it was never really the case that like being open book is like the exam questions are typically not something you can just sort of like search in a PDF and just like whatever copy and paste the answer right they're just more sort of like general questions right but I think having those resources will be helpful to you all right any questions feel free to email me or catch me in office hours oh they're starting now we got to stop this I will continue this in a moment I got to hop in off hours right okay it is uh 12 hours later I'm wearing different clothes since the last slide um I take a break do office hours then lots of stuff happen okay so um yeah we here is the sort of the example of um the study that we're going to use for this lecture will give us example of a randomized experimental trial right so um the this one as I said is considered the best right and uh we'll think of this rat story as sort of an analog model to use as a guide for sort of evaluating other cases where you have sort of maybe they different details or whatever but this is our guiding example for this so what's the story um so 1977 right the FDA they find out about this study in Canada on rats and appeared to show um that sacaran which was in Sweet and Low um I think it still is right it's not like illegal or anything now um you know now they have new ones Neutra sweet and they have a million sweeteners at the time this was kind of the main one um sacaran so uh anyways they find out about this study right that it causes cancer and Lab Rats so the FDA uh issues a temporary or preliminary ban on sacran um this was obviously very distressing to companies that sell sacon right um so they were very against it so what they had to do was review this study right make sure it's good um and then decide sort of what to do about it right so what ended up happening was they they reviewed the study they found it was good um but they found they didn't need to actually in the end ban sacran they just put a warning label on it so um we'll walk through the whole story right but it's it's a good study of an example of how to evaluate a causal hypothesis right pardon me so you can read that whole excerpt it's on page 209 right but here's me summarizing what they said and even sort of less detail than summary of the summ um so the best evidence that they found uh the causal hypothesis that saccharin is a positive causal factor for bladder cancer came from this report uh in what we what they call two generation rat feeding experiments and and we'll explain what what we mean by two generation in a moment um so if you're if you're using rats as a proxy for humans right to to try and figure out if a substance is dangerous or poses a risk of cancer um the rat rats have to get it the same way humans get it right so if it's something that humans eat then the rats need to eat it um you can't just inject them or and you can't even force feed them right this is just sort of the rule for these sorts of things now of course rats are very different from people but an obvious thing they're very very much much smaller than people are um but we have a long history right of studying um carcinogens in particular medications all sorts of things testing them on Rats first before we test them in humans um and the success of these sorts of studies over the long run kind of confirms that this is this is a decent inference to make in many cases right that if it's uh unhealthy in rats or if it's effective in rats you can kind of say that it's it's probably fine it probably does the same thing humans um so how did this particular study works so uh they fed sacran actually to two generations of Wrath so um the first generation as soon as they were weaned from their mothers right they stopped drinking milk and could eat gave them um they started giving them sacon right and then those rats who were fed sacran after weaning their offspring were also fed sacran after weaning um but those children right also got the sacon from their rat mothers while they were in the womb and while they were nursing right so um the members of the second generation uh then were examined for cancer upon natural death sacrifice after two two years of the experiment so um either they die or after two years they cut them open and look for cancer um why why do you need two generations uh rats don't live as long as humans either right I mean you think about sort of cancerous um uh substances in humans right like it like smoking like you smoked for like decades right before they need to test this a bit faster right so we'll also notice that they're going to feed them quite a bit of saccharine right and then by getting it to them in the womb right they can hopefully produce tumors over the lifetime of a rat right which is short okay so for each experiment right the first generation and the second generation um there was also a control group that did not ingest sacon sacon right so um and the simple answer to what happened is that compared to the control tro group The sacarin Fed animals definitely had more ladder tumor right um we'll we'll get at the statistical data in a moment right to see make sure it's significant and whatnot but it was considered sufficient evidence for the claim that sacon causes cancer in rats right um so we'll look at the at the table in a moment right but um cancers were only actually found in the rats that re received the most saccharine right so 5% of their diet was saccharine um their measure of statistical significance in this report is a P value um which slightly different than what we've been doing right but it's basically a measure of How likely it would be that you could get that same result purely by chance right if you run the same experiment right a 100 times you'll have some variation right just by pure chance a rat might get a tumor it's possible by pure chance a bunch of rats could get tumored right even if it had nothing to do with a sachon so A P value uh tells you how likely it is that you can get that result by chance and what you typically want as a P value of less than 005 less than a 5% chance that this is just be random variation right um so they actually didn't find uh statistical significance in the first generation of rats right there was a difference but not a statistically significant one so that sacr fed rats did have more tumors but not right there's not enough of a difference to say that it couldn't be sort of chance but in the second generation it was definitely so here is the actual table right so uh parental right so that's the first generation Offspring that's the second generation we see uh the control group with a dose of 0% saccharine in their diet and experimental group with 5% saccharine in their diet um notice that far more cases in males right than in females so in the first generation looks like there were eight uh I'm sorry seven right cancers found in the experimental group one in the uh control group right so uh that difference six more cancers in the experimental group was not quite enough for statistical significance um but then if you look at the second generation right you got 12 males with cancer compared to zero Offspring two females with cancer compared to zero offsprings so um much bigger difference and that was statistically significant at the point O3 right so very as we noted right the cancer is more frequent in males than in females um they didn't really know how to explain this and didn't try to um in humans bladder cancers are also more frequent in males it's we've always kind of assume that the difference was because males tend to work in sort of different sorts of Industries there you got a A guy working in an asbest with asbest right that causes cancer I don't think it causes bladder cancer but um smoker right more men are smokers stuff like that right there's lots of reasons that men might get more cancers um but none of those uh interestingly right none of those explanations would apply to the rats right so must be something else going on there with the male rats but uh they would need to do another study to figure out what so in conclusion the two generation experiment showed that sacaron caused an increase in bladder cancer in second generation animals especially among males um in the first generation the result fell just short of statistical significance again they calculated A P value on their own and you can in general believe it right if it's published like in a good Journal they didn't it's unlikely they made any math errors right there are uh there is controversy about P values right but um for our purpose is as sort of non-specialists um if they say it's statistically significant right then we can believe it we can even run our own version of the numbers right they did give us the relative frequencies there so so we can run our version the test statistical significance as well right um but if they say it is significant particularly the 0.003 level then you can trust them right okay um so yeah no cancer site been associated with sachine so right just bladder cancer so this study right that's just sort of the summary of what happened right we haven't quite evaluated it we haven't run the evaluation yet that's just sort of description of the of the study so that sort of study is what we call a randomized experimental design um red do we call I don't know that I use that um appreciation in the rest of this but that's what it means um so we'll start with a method of evaluating these sorts of designs and once we've got that figured out in the next lecture we'll talk about um some other sorts of designs and and evaluating those so uh you'll want to look at the real world population of Interest right so now ultimately it's got to be humans right that's why we're running this we're someone out there might be interested in cancer and rats but the vast majority of the public and the reason you might right use this as a pretext to ban soda or put a label on soda is because it's supposed to apply to humans right um but the only population they actually sample is labrats right so strictly speaking our statistical methods only permit a conclusion about rats right we can make an estimation from the sample back to the population but at all rat um so obviously there's an inference to humans implicit in here somewhere but it's kind of a separate issue than our sort of statistical model right um the causal variable ingestion of sacaron right effect variable occurrence of bladder cancer causal hypothesis is that ingesting sacran is a positive causal factor for bladder cancer and laboratory rods um and as we've seen right in the last lecture when we talked about sort of understanding of what we mean by these CA hypothesis it's saying that more cases of bladder cancer would be present in the population if all members were exposed to SAC right then if none were remember those are our two populations X and K right X where everyone gets the sacon and K where no one gets to sacon and to say that sacran is a positive causal factor is to say that in the K Group there's going to be more cancer than in I'm sorry in the X group there's going to be more cancer than in the k group and think of X experimental right a control okay okay so in our evaluation so here's we're first going to do the evaluation but what not giving you the numbers 1 2 3 4 5 6 later you'll get the numbers 1 2 3 4 5 six but right now we're doing all the stuff we will eventually be doing so you'll get it twice um so you want to look at the data right we've already sort of looked at here's the table again right um but in your evaluation you're going to go through it in some more detail right so in the first generation uh the sample size right uh 74 rats that got no sacon 78 that got a diet of 5% sachine so of those 787 got uh bladder cancer of the 74 controls only one right um so experimental design so as we said right a causal hypothesis is actually about two hypothetical population right it says you take the population U right and you say what if um in one world they all got the causal factor and another world none of them got them so how do we replicate that sort of hypothetical situation real life um well all we do as they did with these rats you take real samples and make them play the roles the hypothetical population so you randomly sample from the real population and then you just divide that sample in two also randomly right then take one of the groups call it the experimental group and make them live in the world where they all get sacon right and then take the other group we'll call them the control group and make them live in the world where they get no sacr right so so here is the the diagram so there's the real population in the gray box right and then X and K are these hypothetical populations that we were talking about before right so right the control group as I said we put them in the condition of the K population right that gets doesn't gets doesn't get the causal Factor at all um so maybe that's intuitive why that's equivalent right to the hypothetical populations um right or like having two actual populations that resemble the hypothetical populations and sampling from them right instead what we're doing is we're sampling from one population and dividing them into two groups question is why is that equivalent well um maybe it's obvious and I I partly worry that their explanation here is more complicated than sort of than it needs to be right but okay say you take all the rats right and for each one you flipped a coin right that so just one population say for that whole population you flip a coin determine whether it gets saine or not right so 50% of the rats are getting sacaron right then you could sample from that population right and then you'd end it up in the same situation ours experiment where a random a random sample and half gets sack and half don't right so what we actually do right we take a random sample of a population and then divide it into two it's basically equivalent um creating the hypothetical population and sampling from those um okay the Lab Rats are uh they're bred specifically to resemble rats in the actual population of actual rats right so that inference fair enough um and what you'll end up with is two observed relative frequencies of rats with bladder cancer right so the frequency in X of the effect and the frequency in K of the effect so let's look figures here so again on the left we've got our model where we got the real population and we got our hypothetical populations and what we actually do is we have randomly sampled 152 Rats from the real population um and then we divided it again randomly into two groups one all of which get sacran the other none of which gets sacon um 78 in right the X group and 74 in the control group and now we've got our relative frequencies so the frequency of X right the frequency of the effect in the X group is 7 out of 78 and in control group one out of 74 again this is the firsts so notice right that we had a random process twice here the randomness came in at two two stages first the the first sample from the actual population um that's got to be random right so you don't want to just be picking Rats from like that live near a toxic Waite or something like that or that or otherwise eating an abnormal diet so they want to again you can go back sort of the definitions of Randomness right but it should be that um any rat has an equal chance of getting uh sampled and that there should be no correlations right um and then second the division of the group into the experimental control should also be random right so you don't want to like divide it into all males and all females or like divide it by weight or something with the healthier rats in one group and the sicker ones and the other um now notice the the otaa report doesn't mention any of that stuff right but this is the sort of thing that in a published study it's been through peer review you can pretty much assume it right um and as an aside I'll probably bring this up in the review session the exam when is the well good to know before so if any of you are thinking of writing on the second topic on the essay right you'll notice that um not a causal thing this is from like previous two chapters ago we're just talking about um statistical hypotheses about a correlation in the population um I've simplified it a bit like one of the things in the book they worry about a lot is like um how well right does a sampling method approximate random samp up with all reasons all kinds of reasons why it might not be right you may recall um right the sample of school children and how they said well if you're homeschooled right maybe you wouldn't get sampled and so on so on um the purpose of the essay that is totally simplified and I just stipulate that it's random sampling right um that's to make if you choose that second essay topic to just make that easier for you right and you will find later on right in in a study like this um lots of times you can just sort of assume okay it's it's random enough we'll see actually as as we get into the other experimental designs it starts to become more of a worry again right but um for the purpose of a random right experimental design if it's published it's probably random right yeah and for the for the I say it's random just you can take my word for it and say yeah that part of it good right attendance quiz attendance quiz typ your browser um okay and I'm GNA figure you've got it in there by now moving on okay um so figure 8.3 which my head is a little bit um diagram of the data from the first part of the sacon experiment so um notice it looks a whole lot like the sort of diagram we'd use to evaluate the existence of a correlation between two Val variables right um pardon me in this case one variable is the cause variable right um presence or absence of sacon and the diet um which characterizes the two groups so we got the experimental group and the and the k group and then then the second variable is the effect variable and that's going to be the occurrence or not of bladder cancer so we have 9% in the experimental group and then uh I think it's 1% in the um controll uh so you could run this right so again they use P values and they just sort of tell you it's not statistically significant um but you could also take those sample sizes and and look up your um look in your tables and you would also come to the conclusion that this is not significant right there's going to be some overlap we'll get into that a bit later um but yeah these are the two relative frequencies uh so is there a statistically significant difference right between the relative frequencies we're going to determine that right by using those relative frequencies to mer work backwards to the population estimate the values of the probability of the effect in X and the probability of the effect in K in the population um so we got to find the intervals right and there's the equation right so the probability of the effect in X will be the relative frequency of the effect in X plus or minus the margin of error same for the probability of the effect in K and if the intervals don't overlap you got a statistically significant difference which is good evidence for the existence of a causal relationship between the cause and effect variables right very similar to the correlation stuff right so the math is basically identical right it's all in the experimental design right if the intervals do overlap the difference between the relative frequency is not statistically significant which means that there is not good evidence for a difference in the probabilities in the population which in turn is a fails to be good evidence for a causal relationship so if there is overlap right you'll say there's no evidence for a causal relationship now that is different from saying that there is evidence for no causal relationship now overlap just means right that there might be no like the evidence is consistent with there being no causal relationship it's also consistent with there being you know a relatively low level of Effectiveness some causal relationship but not a very effective one right so you have to be careful withing right it's just you don't have evidence for causal relationship but it's not evidence that there's none whatsoever there might be a little bit um estimating Effectiveness it's again same math as estimating the strength of correlation um right you take farthest ends calculate the difference between those and then the nearest ends if you got your intervals calculate the difference between those and that that interval will then be um the the actual strength of correlation lies somewhere in that interval right um I think I said strength of correlation but the effectiveness of the causal relationship um okay so if you look up write your margin of error for a sample of the 78 um you're looking at roughly 12% margin of error um so definitely the intervals overlap right you got 9% and 1% um now if we used a different slightly different confidence interval so if we weren't shooting for 95% confidence and we just pulled it back a little bit say 92.5 um then you would have a significant result right um want to end up with a looks like a margin of error or a 4% or less right at at 4% they'd sort of meet around 5% um so just I guess a reminder that um these claims are all assuming that we're shooting for 95% confidence but you could always sort of if you want to be less confident then your margin of error will will shrink right but typically 95% is going to be our default okay having done that with the real example um here is the program right there's sort of like the format that you're um your homework and your exam so step one what is the real world population and what is the causal hypothesis so right identify the real world population actually sampled um here's a good point where you might note any important differences between the population actually sampled and the population of interest um it is important that in this case that the population of interest is humans and the population actually sampled is rats right um You Want to identify the cause variable and the effect variable and then State the causal hypothesis right hypothesis that there's a positive causal relationship X and Y something like that okay so that's step one step two the data um so identify the real W samples and the particular data they found from those samples um are the what were the relative frequencies um of the uh the property of interest in the two samples um be sure to give the sample sizes if if they have it right if they give it to you um step three identify the design of the experiment so here's where we haven't seen other designs yet right but um so for right now the only design you know about is uh the randomized experimental trial right you're talking about this rat experiment that's what you would say here um and explain why you think that's the best model for this sort of experiment right um why in the rat case is the best model well they've taken right a random sample and then randomly divided it into two groups and then manipulated those groups by applying the causal factor to one all the the individuals in one group and none the in the other that's pretty clearly a uh experimental trial you're doing a manipulation taking these groups and actually doing something to them going to be fairly obvious right if you're studying smoking in humans right you're going to have to do a different kind of trial you're not going to take a group of humans and make them smoke okay step four so how well does the random sampling model represent the actual process by which the sample was selected from the population um and you can you know a range of answers very well moderately well somewhat well not very well um again in this sort of EX this sort of design it's going to be very well right randomized experimental trial does a great job mating random um issues there could be issues right so um right does every rat in the population have the same uh probability of being chosen well no these are Brad right for labs um so you might State uh that that part is not satisfied but they're specifically bred in a way to resemble wild rats very well right so that's an issue of Randomness right that they're aware of and that they they with and then um as far as sort of any correlations in the population uh again right so long as they're not sort of like sickly or or um uh right you don't as long as they're randomly divided so again a lot of this if you're reading this from a journal they may not even get into the specifics of this and you kind of have to just assume that's some of the things you might say in this um step five evaluating the hypothesis so here's where you sort of do the run the numbers right is there uh a significant right is is the uh is there evidence right for the causal hypothesis meaning is uh the difference between the relative frequencies or the estimations in the pop you take the relative frequencies right you uh apply the margin of error so that you have an estimation of the probabilities in in the population is there overlap is there not right calculate the effectiveness that's where you do all that math um and then finally uh take a moment to just in the light of all your answers from the previous steps give a summary of how well right you think the data support supports the hypothesis right so very strong support somewhat strong not very strong note the various factors supporting your answer so the fact that this is a randomized experimental design which is kind of the best time that that'll count um right how how effective was the causal factor and so on um all the things that contribute to your answer there you may find it helpful to to draw out the boxes right um if you're really good at the math and you just want to use the rule of thumb uh calculations that we have that's fine too but um definitely helpful to drop the boxes so like when you do step one draw a box representing the population sampled right um and you might have a larger box either above or around it right that represents the actual population of interest and write anything you want that if you're you know visualizing it all may also help you think of like different factors like oh is this really random is this population is a sample you know uh really the best sort of sample does it reflect the population at lures any differences stuff like that may help you think through those issues um when you do step two add smaller boxes off to the right representing the samples you can label those boxes with the then with the data right the relative frequency sample sizes and stuff like that um step three connect the boxes and label the sort of the sort of design that's being used in the study um step five you can use right you you got the boxes now you can sort of draw out the intervals and do your calculations it's help it's helpful to think with your hand on paper right and to out there trying to do it all in your head can be rough even with relatively so um not required right but you may find it useful um if you do a neat job of it and want to put that on the exam great that helps us as well right again can get a perfect score without doing the diagrams but um it can be helpful right and and we'll take a look at them right so here's an example of one such completed diagram right so you've got gray box just the population of LA w Lab Rats right so here they're the actual population um they're taking to be just Lab Rats right which means they're restricting hypothesis just lab and that solves the issue of whether Lab Rats are representative of the greater population of rats in the world right um so that's one way to do it um so through a random process there's a sample of 183 right we see the little box there and then it's split into two groups and you've labeled the numbers of each group 94 and 89 and which is the X which is k What are the conditions they get um again little arrows to point out that process is also random and then you can jot down um your relative frequencies that find and point out that it is statistically signicant right this one is I believe from the second experiment that one was toally significant um so let's walk through that actual evaluation of that second generate experiment so um again these were the rats whose parents were um his mother whose mothers were fed sacran for their whole from weaning till death and so these rats were exposed to sacran from conception for the rest of their lives or at least for two years until they were um okay so step one what's the real world population and what's the causal hypothesis so the population sampled uh consists of rats breed for laboratory experiments causal variable ingestion of sacarin effect variable bladder cancer all this right the same as the first generation uh the hypothesis that issue is that ingestion of saccharin is a positive causal factor for bladder cancer in a population of labrats um here is a good time maybe to note that um the population sampled is not humans right um but kind of supposed to be relevant to humans and a little Gap there okay step two what's the data well the overall data is that you know here it's a it's a um kind of up to you to decide how much detail you want to go into with um this seems to be about right complete enough right so they don't break it down and males and females because it ends up and the conclusion is not sort of relative it's interesting why all males but that's not really anything that they get into or anything they were looking at so you can sort of skip that breakdown so um this a little open to interpretation right what is the relevant data that's worth putting in here and what can you leave out so here they just say look uh the control group right 14 of the 9 4 got bladder cancer 15% um and in the control control group uh zero out of 89 G the experimental group yeah 15% in the experimental group 0% in the okay so step three what was the design well the experiment fits the model for randomized experimental design um why well the sample from the population was randomly divided into into two groups one got the causal factor and one didn't okay step four random sampling um and here they basically say yeah right it's a randomized experimental design it's a professional study reviewed by the otaa um right it got through peer review um it's probably good right on that on that point um step five now we evaluate right so um here that now the study has evaluated it for you right they're using p values which is again slightly different than what we're doing right um You can trust that they've done their math right that's something that would definitely get caught in the peerreview process right and folks that allowed their paper to get published know much more about statistics than Q I probably do right um so if they say it's significant you can say it's sign you can just sort of repeat that if you like if you want to run the numbers yourself you're you're welcome to um in this case you you would probably um using our rules of thumb it wouldn't actually come out significant using our method right um and they explain why here right so you may may recall um one of the slides we sort of mentioned this when we were talking about statistics when you get um really low values when you get out towards the sort of tales of the distribution you have like 0% of something um the margins of Errors start to get smaller than are actually in our usual rules of thumb right they get a bit distorted right and so we haven't provided you all that or all those tables right so um that's why if you ran these numbers Pro you they would overlap a bit um based on our rules of thumb and our tables but the reason that it's actually still significant even though it would seem not to be from our sort of quick and dirty and table methods is be because of what happens when you're all the way at 0 per. um okay so the big take away from that is like yeah if in the peer-reviewed study they say it's significant at a very high level um just believe them I don't think that you're catching some statistical error not okay so and then step six summary uh clearly a careful study in line with random sampling models right um the report indicates we could raise the confidence level to 99% remember it was significant at 0.003 level right so very significant um so we're good very strong evidence in favor of the caal causal hypothesis that sacarin is a positive causal factor of bladder cancer and WRA right um and here's if there's any potential issues with the study it would probably be issues with applying the result to humans so here is the figure again we saw it a moment ago right but um in the process of doing this you might draw this out okay one uh point about this issue of applying this study of rats to hum so um the hypothesis the hypothesis as we stated is just about rats right um so there's obviously some controversy about how to how to apply that knowledge to public policy regarding humans um rats are not humans right obviously and secondly uh the rats were fed quite a bit of sacr so 5% of their daily diet was sacon which is the equivalent for a human of 800 bottles of soda a day so far more than any human um would dring so why why do why give the rats so much right isn't that sort of a big flaw in the design um well it's a good sort of proxy for for what we get in the real world so in the real world right yeah not everyone who eats sacon or smokes is going to get cancer um but when you got so many people in the world right engaging these sorts of activities even if a very small percentage of smokers get cancer or a small percentage of saccharine eaters get cancer ends up being a lot of people right we've seen this sort of thing with covid-19 even if 1% of people get really ill well in a country of 300 million people that ends up being a lot of three million people right so um you could do the study that way which would I guess more realistic right you feed thousands of rats relatively small doses right and then see what percentage develop cancer but just doing a study with thousands of rats is really sort of um so it's a generally accepted practice that you can sort of substitute a larger dose for a larger sample right so I don't want to I don't want to feed a thousand rats a little bit of sacon every day maybe the equivalent of two cans of soda or whatever so I'm going to take a smaller sample and feed them all the equivalent of 800 cans of soda um and it's generally accepted that that's it works out roughly the same um now this does there is a reason that that wouldn't be the same if there was a threshold effect right so a threshold effect is like maybe it's possible that cancer is only triggered really big doses right and somebody who drinks normal doses for the rest of their life would never pass that threshold right um but uh so far as we can tell there's not a lot of these sorts of threshold effects in biology all all the effects we see seem to be a bit more linear right um and as for whether rats are similar enough to humans to draw clusions about the effects of sacon um I mean so far all the substance substances that we have found cause cancer in humans also cause cancer in rats um so that's interesting of course it's different from the claim that every substance that caused cancer in rats would cause cancer in humans right which would be the inference you would maybe want if you wanted to take this study and apply it to humans um but it's suggestive enough that there's not really a good reason to think that sacon would somehow be an exception to this correlation we see right between Cancer causing agents and humans and cancer causing ages um so it turns out that the cancer rate found in rats translates to about a six in one million chance that a human drinking a normal amount of soda will get cancer um so it's a risk but kind of a relatively small risk right small enough that it seems like um a person can decide for themsel whether they want to drink soda and run that risk right so that's sort of a public policy decision um that's probably why they just went with a warning label right instead of a b of sacran um and that's a that's a matter for chapters 9 and 10 where we'll talk about making decisions based on the evidence we have um for now right we're just saying what sort of evidence can we even say science has given us and in this case science has given us evidence that there is a causal relationship right and it's up to the policy makers using decision Theory or whatever to to what to do with enemy so a a a note about double blind trials you may have heard this terminology right so you really have to make sure that the only systematic differences between these two groups experimental control groups are the causal variable the one's getting the causal variable and one isn't right so and we experimenters typically are very careful about this so for example like um when there are studies that require an operation on the uh experimental group uh they'll give the control rats an operation as well that does nothing but they'll still cut them open and SE on sew them back up just to make sure that it wasn't like the up and sewing back up that produced right um and then an experimental uh uh in experiments on human subjects we often use what we call double blind studies so this means that the subjects that are in the study they don't know if they're in the experimental control group right so for example if it's a testing a medication um the experimental group will get the medication and then the control group will get a pill that is nothing right it's Placebo a sugar pill or something um so they don't know whether they're actually getting the medication or not um because youve probably heard of placebo effect right um just knowing that you're taking a pill and thinking you're getting medication will sometimes cause changes in your body right some sort of psychological effect or something um so you know they'll take a sugar pill and be like oh yeah my headache went away right it's a powerful sort of psychological effect so they want to make sure that if they're doing anything to the um right experimental group that is not strictly speaking the Cause right so in addition to the medication they're taking a pill right and they're seeing themselves take a pill make sure you do that to the uh control group as well so that's just single blind uh in double blind the subjects don't know which group they're in and also the experimenters that are doing like diagnosis or interacting a lot with the subjects they also don't know which are in the control they experiment right because um particularly if they're doing diagnosis right so they're trying to decide okay is this effective for colds or whatever and they have to evaluate symptoms right of the subject and decide whether they have a cold or not well um you don't want subconsciously their knowledge that they're getting the drug or not getting the drug to affect their right evaluation of the uh of the subject right right of course somebody there somebody somewhere has to keep track of who's in which group right but those people shouldn't be really interacting with the subjects because they could subconsciously be treating them different they subconsciously be um tweaking the data so you can actually by using placebos sometimes use the same group of subjects as both the experimental and the control group just in different sessions over time right depending on um what the uh uh depending on what it is so if it's if it's giving like a headache drug right then yeah for sure you could um give one group The the headache you know they'll get one set of headaches and you give them the experimental drug see how it does and then on another day or whatever when when they get another headache give them a placebo and and that way the same people can be the control and that's nice because it matches right remember the residual States that we were worried about in the last lecture well if it's the same person basically same residual State I mean I guess your residual State can change over time a bit but it's as close as you're going to get but to to having the exact same right two groups be exactly the same use the exactly same people in both groups okay well that's enough we're just over 50 minutes so apologies for going a little long um but yeah uh give me hit me up if you have office hours if you have any questions or anything like that and so on right um always here always happy to help want to make sure you're all getting this okay