Transcript for:
Quasi-Experimental Designs Overview

come to this concept of the quaza experiment and I want to spend a bit of time now talking about this because I think very often one of the problems is that we can't do a true experiment we like to but we we we're not able to partly because you can't randomize properly or it might be because you haven't got a lab available or because people can't travel to the lab or we simply you can't organize things in that kind of fashion either so but we'd like to be able to keep some of the power of the true experiment the power of the true experiment is to be able to say something about what caused the difference in the outcomes what the difference in different variables so we'd like to keep that power we'd like to keep the strength of the the experiment but we're not able to do do it in in the proper fashion and that's what the quazo experiment is trying to do it's it's called a quazo experiment because it looks like an experiment but it isn't really and it's missing certain key factors and I'm basing this on the discussions by these the classic Works go right back some 40 50 years ago um Campbell and Stanley and cook and Campbell two books here I I should say don't even think about reading those books unless you're a very very very hot statistician they're quite heavyweight books lots of stats in them and so on which we should just put you off but I thought I'd give you the least of sources so you know where it comes from but I'm not recommending these readings to you read some other more simple things to begin with before you get into the the diff the tough stuff so these are uh rather tough books but I will be using some stuff from them uh they talk about some of the issues in fact um you'll notice the handout I will use later which I gave you in the previous week um the the threats to internal l actually comes from uh cook and Campbell's book uh they they list some of those things I think I gave the reference to it yeah quite presentation um so that that comes from there and also I'll be using later on today some of their designs that they talk about as well so i' I've taken things from those books um to to to structure the the rest of this session now what is a quazar experiment as I said it looks like an experiment but really isn't and the point is it has a general style and approach of experiments so you have groups uh and you have treatments and you have things you do to manipulations and and measurements of things that you do or recording of things even if you're not measuring things you can record them um all of that's just like a true experiment but the one thing that's missing from it is there's no randomized allocation of participants so when you have two or more groups you don't randomly allocate to those groups rather those groups EX exist already or they're picked in some other way very often they exist already um they're not but they're not R allocated so they're there okay so if go into a school the classes are there so you might use different classes you different groups then you'd be talking about a quer experiment because it might be an experiment in all kind of other ways but that one key factor and of course immediately that undermines the whole idea of the strength of the experiment to link the the the causes to the outcomes um and you can't do that nevertheless and this is the the importance of this work I think is that cook and Campell suggest that if you're careful how you do it if you pick certain designs of experiment and not others and at the same time you're careful about inspecting all the various threats to validity things that might have gone wrong with what you're doing things that might under undermine your results um if you're very careful about that then you can use these designs to say something about what is causing the effects that you discover um so you can make some judgment about CA and conditions on the basis of that but you have to be very careful about that the start is is the ones to avoid these are designs that cook and gamble suggest we should avoid under any circumstances these are not experiments they're not even proper quaza experiments um they're just too dangerous what he mean by that is is that there's too much chance that we'll make force deductions about what's causing uh the the effects so here is that what they call the one group post test only design and you've got three stages or three events you've got an experimental treatment group just one group that's that's a that's a clue here it's just one group no control one group we do something to them we treat them we we we give them some kind of teaching or we we put them into a loud noise or whatever we give them some moart and so on and then we observe what happens to them we might can measure how much they're learning or we might measure their reaction times all sorts of other things whatever we're doing at the end we observe them OB observation are us in a general sense it could mean looking at them it could mean talking to them it very often means doing something where they get a test and you get a number at the end or some kind of measurement of things so it can be numeric but it doesn't have to be numeric so we do we we we have the EXP group have a group of people remember they're not randomly allocated they're not randomly selected they're just a group of people that's a c point about quaz rment we do something to them and then we see what happens and of course the result of this which I hope you recognize now is that we've got very little warrant any any justification for any deduction of cause of effect you know what we observe something happens to them how do we know that was caused by the treatment or whether it was caused by something else or whether it was some some kind of something to do with the group themselves um that you know we know nothing about this group just got a group of people we do something to them um and then we observe something well how we know that that's because of our treatment how do we know that's not because of the group of people themselves and what they're like or something else that happened to them in the meantime and so on so this is just simply a bad design and and and we shouldn't shouldn't shouldn't do it now notice the experimental you might stay hang on a ite this looks like a survey to me this looks like we've got a group we've done done something to and give them the questionnaire then we observe the results we go through and measure what they've done at the end end of that isn't that isn't that a survey well the answer is yes which is why in a survey we're so careful to make sure that the the sampling is done properly we're using random sampling or quot sampling or whatever in this case of course that's not true the experimental group has not been randomly selected randomly allocated in any sense whatsoever so we've not got that that safety valve that we have in in the survey so avoid this one so and CLE this might looks so slightly better we've got two groups now we've got group X1 we've got group X2 so we've got two groups of people remember again they're not randomly allocated they simply exist already or they've been picked in some other way and one group gets the treatment that's the T um and one one group either gets a different treatment or doesn't get a treatment at all um it's the control group if you like um and then we have observe both of them separately we observe what happens to group one and we X1 and then we observe what happens to group X2 so we get OB observation two and we see if there's a difference we might expect there to be a difference might expect the results of 01 the observation of group one to be different from the observation of group two because they've had the treatment T done to them had something done to them that's changed how they're going to behave so that that would be our expectation but if you do find a difference we can't be sure that it's the treatment that's made the difference for example it can be history effects on X1 only maybe X1 was a group let's imagine we're we're doing this in the school and it turns out that X1 had a really good math teacher in in before they did the experiment and X X2 had a whole range of of um of Supply teachers who they had no real relationship with at all and some of them were quite dodgy and so on uh so they had very poor teaching before for home and then you give them this test and you're you're trying some new technique of teaching with them uh math teaching and of course they do very well gr well that's got nothing to do with your method that's surely to do with the fact that they had good teaching beforehand so X1 had good teacher and X2 didn't and of course this comes about because we haven't randomly allocated we simply picked two two groups that may have had some difference we we we can't tell beforehand so history effects might make a difference or it might be the o1 and O2 are done different times I mean one might follow the other X1 might be one done one month X2 the next month it may be things that happen between those two points might make a difference to the two groups some there can be mortality problems and of course the term mortality as I talked about in the case of surveys mortality is used here not be necessarily dying although it can be but it means people dropping out or people leaving in some way um example given here is the treatment causes more Dropout than the the control group does so maybe there's something we're doing to people that means that you know that that whatever it is we're doing that treatment we're carrying out caus them to drop out the experiment more than the others do and so that that you know gives rise to some kind of bias maturation problems again this is true of of school children imagine you did X1 um at the beginning of the year and group X2 at the end of the year so they'd be you know 11 months older um than than than they than the x1's were well you might expect different results children mature quite rapidly and you might expect different results if a group is 11 months older than another group so again that kind of maturity problem maturation problem might be an issue and of course above all it's a selection X1 and X2 are different groups so whatever has been done to select them again it's not uncommon in schools even today to have them streamed in some way to have one class that have been doing well in in their exams and their tests that the bright kids and another class that isn't doing so well and and they're the not so bright kids um and what if that was the case in this school and and X1 with the bright kids and X2 with the not so bright kids again you'd expect a difference because of that you've got no control over that this is not a a true experiment you're having to accept the groups as they are and that selection of those two groups might affect your results so on those kinds of grounds again cook and Campbell suggest that we we don't use this design that that Simply Having um you know these these these two groups and one observation might not be a good thing to do third design another variation on this this time we have a before and after but we only have one group so we take some of the good things from the second example having uh um but we uh um sorry from the first example but we we we we we lose some of the good things in the second example so rather than having two groups to compare we do look before and after so this way we can say we can we might think we're kind of eliminating the the prior effects that that we the problem we had in the second example the groups might be different so we we have a group we do something sorry we measure something with we observe them in some way we then do something to them and then we observe them again see how they've changed so so this might be I don't know um let's imagine where we're a psychologist experimenting on the effects of alcohol on on uh on the ability to to to react when you're driving so Reaction Time typically you might take a group of people You observe them so you measure their reaction times you give them some some driving type tests maybe in front of a video um image of of of of a road and so on you you test how quickly they react to things and there's some nice Machinery that can measure down to fractions of a second reaction time then you give them some alcohol uh make them drink a fixed amount each and then you get them a testan um and the same test you can do again the same reaction times and so on um and you can see whether their reaction of slowed down you might expect to slow down we know that happens um okay so is that a good experiment well again cooking C suggest no the reason is because you've only got one group here that all sorts of things can happen here with just one group and we need some kind of control group as well alongside it so we might have test effects maybe doing the first observation or doing the first measurement might in some way prepare them for the second one so back to my example of the road test it might be that having gone through the video and watched all these things you know your reaction time is better because the second time around it's the same situation and you're um You're Expecting things to happen so your reaction time it it gets slower because of that so there's some kind of test effect something in the way you set the up that that means the differences are are um are caused by your experiment not caused by the the group there can be instumentation problems changing measurement scale um from 01 to o02 um yes it can be that uh the the way you're measuring things changes over time that you learn or maybe experimenters do things differently or they they they apply things differently or or that something in the way you're measuring things changes over time it has nothing to do with the the actual treatment but something else is going on that causes the differences and you can't tell if that's the case or not but nevertheless this is not an uncommon situation I mean I have to say when you look through the published literature in the social sciences you find this kind of design again and again and again and I have to admit that I have published work that you uses this design so although cook and cab will say try and avoid it there can be reasons why you you can say at the end despite the dangers you can work around it in various kinds of ways to try to amarate try to control some of those dangers in in the um in the in the design I mean I I've done stuff based on on students learning so I've got a group of students a class of students I've measured them at the beginning of the year I've done something to them I've given them some teaching or some whatever it is the technology I've used with them and then I measure them at the end of the year now I got around that to some extent by comparing it with other kinds of situations in other universities and other tests that have been done so I knew the kind of things that typically happened to to groups in that situation although I didn't have a control group as such to work with uh so I could ameliorate that to some extent um and of course as as before that last Point as cook and Campbell stress again and again you have to look through and see if there any possible threats can you think of anything that could have gone wrong uh that might have caused the differences rather than the treatment that you were trying to um test out and again you you can do that by going through the the list of threats almost and checking them off now that didn't happen that didn't happen oh that might have happened I'd be careful about that see I have to inspect that and see if that happened or not and look in the background so it's better but it's by no means a perfect design and ideally you should be avoiding it if you possibly can which takes me to um let me just oh yes the regression effect another danger here as well um which I had mentioned in the previous session I think I talked about regression in one of the earlier sessions um regression to the norm I talked about then um how when you do a retest uh the chances are that the the high performers will be slightly lower the next time round and this is a real problem if you if you're selecting groups based upon some kind of preest now it's of often happens in educational settings uh more rarely I think in other settings but in education very often we do that we we do some kind of preest and then we pick people on the basis of that preest so I've given an example here of a disadvantage group as it GES to comparison group the disadvantage group might be some group disadvantage in some way it might be learning problems they've got or it might be social disadvantage or it might be they that they've had less you know te formal teaching or something of this kind compared with the other group some other reason why they're disadvantaged and then typically in this kind of design you you start to match them up and you match them in terms of score so we still got a match group we do a preest to to see if we can match up the scores so here I trying to indicate on my my line across the top my scale across the top on the left hand side we got the low scorers the right hand side of the high scorers and the the the ticks across the line indicate where the people came that the people across the top on the just group are doing worse on average than the the comparison group uh they're more to the left but we can pick up pairs of people who match so we've got two groups now who have been matched in terms of their pre-test scores so we might think actually we've got a group comparison now we've got two groups who let's imagine it's it's again teaching mathematics and we're interested in in our new method of teaching maths so we look at their their exam results from from the you know the previous year and we we pick the two groups um on the same exam and we pair them up so one pupil from the disadvantage group paired with one from the comparison group who had the same score the same mark on the examination now the Temptation here is when you redo the experiment when you when you run your experiment and you you you remeasure what's going on here the the the the Temptation is think we got some kind of control here we we've got them bit like random allocation really the trouble is what we get is a regression to the mean that those who were scoring high on the previous occasion will tend to score lower even if it's a different test but if it's in the same kind of area like mathematics they will tend to score less well the second time round and those who scored low the first time round will tend to score higher the second on average the second time around some won't but some will what's going on here in any score that you get is a combination of a reflection of the underlying factors your ability for example in doing whatever test it is along with other factors that are in a sense are issues of luck you know it was a good day that day you had a good night's sleep and so on and you know you you happen to revise the right kind of questions for the test and so things like this all those kind of factors that that might not happen the second time round so there's the underlying ability which is the main thing contributing to the score and these other factors and of course on the retest those other factors change so you might still be good but you won't be quite as good or maybe you weren't terribly good the first time around or you didn't get a good score the first time around this time lacked with you so you get very good score so it changes but on average those at the extremes would be those that had both good luck and a good underlying ability so they're going to be the ones that tend to have a bit less good luck the second time around and therefore they move back to the the mean so we get this kind of situation um I'm not sure that diagram shows you very much at all I think I might describe that's a bit bit complicated but what I'm trying to show here is that from the top that's the the original pre-test situation that people are moving to a I think two things to to illustrate from this that if you follow the dotted lines that's the the disadvantage group and where they end up so they kind of move some move up some move down they have slightly different scores the control group or the the comparison group is the second line down they move to other scores as well some of them good some of them less good and so on overall the pattern is the same the second time round the ones at the bottom the disadvantage group is still lower than the than the comparison group as you might expect on average that's what happens but the individuals have moved such that the disadvantage group actually end up doing um uh uh slightly worse than the comparison group so it looks like we've got a difference but actually it's simply because of this regression to the mean Factor that's going on so the the comparisons are regressed to their mean which is a higher mean than the regression to the mean of the disadvantage group so if we did an experiment based on this we'd find a result but it would be a false result it would be simply because of the regression to the mean so again another one to avoid so you got to be very careful to avoid those kinds of experiments they coulding aable let me just finish with um probably about um 15 minutes worth of of slides going through the the approved versions of the um the designs that for quazer experiments that cook and Campbell come out with so these are not perfect these are still quaza experiments so they're still subject to all kinds of problems that that you know is avoided by having this proper randomization to to groups uh but in some way they are better because we can start to see what's going on and make some um deductions about what's happening uh based upon the results so here's the first design they suggest is a good one to use not perfect but better than the others that I've talked about so far and they call it the the preest post test non-equivalent groups design so you can see it's combination of of effectively two of the situations we had in the previous ones we've got now two groups and we've got a before and after effect as well so group X1 is observed then it's given the treatment and then it's observed again or it's measured then it's given the treat then it's measured again and we have a second group X2 that is observed and then they get something different they are the control group or they get a second treatment or whatever and then they're observed 04 and this is a much better design now we can begin to see if there are differences between the two groups that have had the treatment and those that haven't had the treatment we can see differences over time as well have changed you know so we might begin to eliminate some of those effects that have caused by the groups being different in the first place X1 is different from X2 but still there is a bias possible we've still got that selection bias possible it still may be the case that X1 is different from X2 the group X1 is group different group x2 in some significant way we haven't chosen at random but some other reason they were in those groups might reflect maybe they're the ones who who respond well to the treatment and the people in group X2 don't respond well or something like that and that's why we get the different results we're getting still possibilities we have to look into that there might be in fact that's the the treatment selection interaction I've just talked about the the the the way we've selected the groups is is is in such a fashion that they respond well to treatment or don't respond well to treatment one one or the other so we get the interaction more importantly cooking C suggests that the way we interpret results here depends on the kind of outcome we get and they give some charts which which try to to to lay out the different results so here's the first of the first of those we might get a result like this let me explain what this diagram is showing you we've got the two time periods the preest and the post test here so this is to go back to the previous slide 01 and 03 are the the the preest test and then O2 and 04 are the post tests the observation or the measurement after we've given the treatment and the vertical axis the y- axis here is some kind of measure of of what they did on average what that group was like so you can see both groups are fairly close together to begin with but the treatment group is scoring slightly higher to begin with then we give them the treatment and we test them again and we observe them again and the treatment group goes up quite a lot and the control group doesn't go quite so much it does go up but not quite so much so this might look like that we've found a difference we've the treatment has made a difference the treatment has increased them more than the the uh the control group so the treatment has had some impact on them but we have to be a bit cautious still because it might just be for example an issue of measurement um the the scale of measurement up the up the y- AIS might be you know not a linear scale the fact that they both increased might mean that had the control group have been a higher score to begin with and of course they're not randomly allocated so we just don't know whether that might be the case or not but they could have done maybe they would have increased as well uh as the treatment group have done so that might be simply an effect of the the way the scales operate they're nonlinear scales however if we get a result like this we've got a much stronger basis for our conclusion that the treatment made a difference here the control control group hasn't changed at all before and after the treatment but the treatment group have and in this case we know it's not a scale effect because they've not changed so it can't be an issue so we know that what's happened here is likely to be the treatment there still might be all the other problems of bias and so talk about R you have to inspect it for that to see if that's the case but we have at least much better um grounds for for concluding that our treatment has caused the difference and cook and cab will say even better is this situation if the treatment group started off with a lower score or lower measurement than the control group and then at the post treatment um they get a much much bigger bigger score then we can be pretty sure that that it you know the treatment has has done something that's caused that difference to happen so we've got results that we can rely on much better not not perfectly but but much better okay a second design they suggest is a good one is this one and it's the interrupted time series um and here we have rather than just simply a couple of observations we have lots of observations so we might start with just one group and we observe them over time so we observe them once twice three times four times and so on so 01 0203 are a string of observations I've got up to 04 and then the treatment is given to them or something happens to them or we do something and then we observe them again over time once and then a bit later 06 bit later 07 08 and so on so we've got a whole string of observations um and what we're doing here is looking for a change in the pattern now um the numbers don't quite agree I actually got what one two three four I've got six points and then six afterwards rather than four and four sorry about that but in the diagram I've tried to indicate time is across the x-axis from left to right so on the far left is the first observation then the second third and so on so we go across here here as time goes on and here's the treatment Point here's where we give the treatment T and then we observe them again going across the the period after that and what you can see here is if you get this result you can be pretty sure something has happened they you know on average they will you know the figures are changing a bit these These are average figures for for the group you know they didn't change changed that much until the treatment happened and then they suddenly shot up and they carried on afterwards being high that's good evidence that the treatment made a difference it's pretty solid it's not absolutely sure because we didn't randomly allocate it but nevertheless if you found something like this and it was this clear cut you'd say actually that treatment did make a difference it caused the the the observations or the scores to change in that significant fashion um after the treatment so if you get result that you got a really good result but it isn't always that clearcut sometimes you have this kind of result you the same before the test then you give the test and it starts increasing and it carries on increasing again you might say this gives you good reason for thinking that treatment has had an impact it's a cumulative impact it's getting better each time afterwards rather than a step upwards but nevertheless it's a clearcut effect and again good evidence that the the treatment has had that effect on on on the respondents but we had to be a bit careful because we might get this situation and here we can see that they're getting better all the time as they go this might be a maturation issue for example they're get getting better as they go through the time um the treatment seems to have had no particular effect has made no kind of change on that gradual kind of increasing score as we go through time but of course if you just do a bit of statistics on this you'll find that the mean score before the treatment will be about here somewhere and the mean score after treatment up here somewhere so there'd be differences the mean score before and after treatment is different but when you inspect the actual chart here you see that's that's a false perception that actually there's no evidence that the treatment has had any impact whatsoever so if you do a continuous uh this kind of repeated measures kind of uh of experiment then it's important to look at the graphs to look at the the figures across all the different observations to see if you've got this situation because here there's no evidence of of treatment have any any impact and we might even get this complicated situation This Is Not Unusual as well in these time series things to find that you get a kind of what's often called a premature effect that the change appears to happen before the treatment um so you've got a fairly steady situation here and then it starts going up before the treatment happens on this ground you might say well actually it wasn't the treatment it was something else that happened that caused that change to go on rather than the treatment and therefore you've got no evidence that the treatment has had that causal impact something else and it's sometimes quite subtle it's not obvious you know that that's happening if I um if I were to to cover up half the the diagram it wouldn't be obvious from just the the pre-treatment figures that that was going to happen so close inspection is quite important to see just in case there have been any of these premature effects um or even other around suppose a kind of a a a late effect an effect happening after the treatment but but well after the treatment and can't be used either so inspecting the charts and again it's an important point about looking at CRA experiments is looking at the results you get in various ways in particular using charts here to look at the results it's quite a useful way of checking that you've got a valid result or or a result you can rely on more than you can do otherwise the last one um is uh slightly more complicated one but quite an interesting one the regression discontinuity design and this relies on the kind of results you get from doing a a correlation of of two variables and the idea is that um you know when you do a correlation you normally get a spread of results you get you know some people scoring low some people scoring high and you get a spread of those results and if you do it twice before and after then you'll get some kind of relationship those who scored low to begin with other things being equal will tend to score low afterwards and those that scored High to begin with will tend to score high afterwards other things being equal so if you do nothing at all you you you you get some variation of course you get a cloud of results but you get a kind of line of results the low one staying low the high one Staying High and the ones in the middle staying in the middle roughly speaking some movement about but not a great deal now the the discontinuity design looks basically splits that that that range of people in the in a pretest into two groups those who scored low to begin with and those who scored high now you might say mediately what about regression effects here this is a this is a problem here well the whole point about this is it uses that kind of randomization in in the regression effect to to to to overcome this um and what what it does is then do a retest and then displays the chart of results that that that cloud of results to see whether there's been a shift in the results so below those scoring low at some point on a preest uh are separated and not giving a treatment and those who score high on the preest say are given a treatment um and then we see whether that's made a difference their results and what we're looking for is a discontinuity and you get this kind of discontinuity in the results so with with the chart here is a kind of display of the pre-test results across the the x-axis here um left to right and the post test is the vertical axis the y- AIS um and as you might expect there's a kind of randomization these are individuals so this person here scored the lowest score on the post test but the second lowest on the preest that one scored the the lowest on the pre-test score but in fact they were about what one two three about fourth lowest on the post test so there a tendency as you can see because there's a kind of line of results there's a tendency for the preest those who scored low on the preest to score low on the post test and likewise those who scored high in the preest to score higher on the on on the post test not absolutely otherwise it'd be a straight line uh but the kind of cloud results the interesting point is the the group that scored high on the preest that's the ones who come up here who were given the treatment and in this case they've all lifted up a bit so they've all scored better than they might expected to be otherwise on the post test so all their results have been pushed up the diagram a bit that that that distance that discontinuity is the approximately the distance they've been moved up and this is quite a powerful test there's no random allocation here so it's not a true experiment but nevertheless um we can tell on the basis of this that that there's there's strong evidence that the the treatment did have an effect it had the effect of pushing up the results of the the high scoring groups in the way you can see on the diagram here now it's not always as clear as this but if you get this kind of clear result you've got good evidence even though it's not a true experiment you've got good evidence suggesting that the treatment did have the impact okay um so there's some good designs let me just finish with a few comments about you know the overall um use of experiments and and and of quaza experiments in in Social research as I've said I've emphasized that it's it's quite difficult to to run field experiments but not impossible um we got one or two examples of those to to look at and in fact the one you've been doing for the assessment is a a field experiment um and that's why we think about quaza experiments as as a way to do things a way of trying to trying to get to grips with the fact we can't properly randomly allocate in the field but we want to be in the field because it's important to be there so we may need to think about other designs um to do that I think these are quite useful to I mean you might think of experiments as as lab things the kind of things you do in physiology and psychology and so on that's that's the kind of way that and don't don't apply to many areas of of life when in fact they can apply to many areas and more areas than you might think um we can use quaza experimental like designs in a wide range of fields I've got a couple of examples to mention here um if I can remember the details properly one is um it was actually a column in the guardian called bad science written by Ben Golder who's um I think he's taking some time out at the moment to write another book but he looks at you know the results of of um various kinds of scientific experiments and journal papers and things like this in various areas to see whether they've done properly or not and and kind of reveals all the kind of problems with with scientific method and and that come up I mentioned some of his work in a previous session when I talked about the impact of um knowing who funds research on how you make judgments about the research in fact his latest book is on that very factor of you know what if the medical research we're be we're using uh is being funded by the the drunk companies and so on that that changes our view of of of that research in fact to some extent changes the results we get as well it turns out was the off Ben goldacre is the man so he want his he's got a couple of books out now but so Ben goldacre has been writing this this particular experiment is a natural experiment he was um if I get this right um yes the the point was do academic papers that get mentioned in the newspapers in the popular press so to speak do they get a higher impact factor do they get read by more other academics I don't know if you know about this but in in journals that there was a measure of how often they're cited that's called the impact factor so if your Journal paper is cited by lots of other papers later on it has a high impact factor because it's been quoted a lot by other papers a low impact factor is one that's not quoted very much at all or or not at all um in which case it's a low impact factor now the the thing being raised here was does being talked about in the media you know if your paper gets you know into the the headlines on the papers or gets into the the 10:00 news or whatever on the TV does that make a difference to its impact factor and what happened was the kind of natural experiment here um what you really want to do is to have you know one set of papers that is you know hidden away from the media another set that isn't hidden away so we've got two randomly allocated groups one randomly allocated group of papers that don't get into the media at all because we hide them away another group that that aren't hidden and then we can see if there are any differences but of course we can't do that it's just impossible to do that what happened here was a natural experiment and in fact it it occurred because the um I think was it the Sunday the times or um no it was the New York Times sorry American Paper New York Times the the journalists went on strike um and the papers weren't published or maybe it was the production people on strike anyway for whatever reason the paper wasn't published but because it's a paper of record as such they still wrote the articles um but of course nobody read them because they weren't published so they had all the details there so it was a kind of natural experiment but a certain time this paper um mentioned these these these these journals and they talked about them but of course nobody knew about it because it wasn't published so we could we got an actual experimental group here we've got the period beforehand where the paper was being read people did know that you know certain Publishers certain um Journal Publications were were were interesting and so on and we had a period when they still knew which ones it was because they wrote them but nobody read it and then we got back to normal and they back back to work again what they did was compare those two groups so we could know the papers that had been mentioned in the paper and look at those and see what their impact factors were and then we could look at the period when they were on strike when again we knew which which Journal papers they were looking at because they wrote the wrote the articles but they were never published in the paper because the paper didn't come out uh so nobody knew about it and we compare those so we had a natural experimental group and a and a control group if you like in in this situation and of course what they found interesting was it does make a difference um interestingly the number of times you get mentioned in the media in this case in the New York Times had an impact on the impact factor that other academics were were taken in if you like by this to to read the papers more and site them more interesting result but the important point for our purposes is the natural experiment that period when nothing was actually being published in the paper meant you had a a control if you like where nobody was reading them compared with the group beforehand with the GL un another example which I came across some years ago is um in fact I found the paper it's it's gr Graham farell and John Thorne uh two criminologists and Graham Farrell are now now is lury University Professor there and they' done quite a lot of work back in the um uh 199s I think it was in Afghanistan this particular paper is a kind of of um repeated measures design um they they looked at the the figures in Afghanistan for opium production so this was going using United Nations figures United Nations keeps records of of how much opium is produced over a period of time so over several decades they looked at opium production and they said well what has made an impact what has what has caused a change in in in opium production it's a big problem because it's a major source of of of of of opium um and heroin is Afghanistan so a big problem for the rest of the world um and what they discovered was that the thing that made the big impact if you like it was a is of here was open reduction quite high or for you your it was quite high up here and suddenly it went down and it carried on low and then it went back up again and what happened during that period when it went down a bit like this graph suddenly changing there was a natural experiment went on here not one that we caused it's not a you know we didn't manipulate it what happened was the Taliban took over and the result of their paper is they suggest that if you want almost if you want to cure heroin get the Taliban back cuz um what happened was that the the the the amount of heroin being produced was quite high and suddenly the heroin the the Taliban took over imposed new laws I mean not not nice laws they were pretty strict and pretty violent about how they imposed them but the heroin production went right down and then when they were were thrown out again the heroin production went back up again so kind of natural experiment over a period of time here where the the treatment was was the country being taken over by the Taliban now I'm not suggesting that we deal with in Problem by having the Taliban in charge because it wasn't a nice period for for the Afghanistan Farmers um but nevertheless it's a nice example of how that kind of thing happening in the world can give you this kind of natural experiment that you can then use in a Time series approach so you can use the these features in research designs in a variety of different ways um and it's always worth doing so but you always have to consider the threats to validity the things that might have happened that might cause you to get effects that that that aren't really justified by your experience