welcome back let's have some fun today I would like to test causal hypothesis with you previously we talked about using Excel to describe the variables you can compute the mean the median the mode standard deviation and you can use Excel and other programs to figure out other types of statistics as well but what if your project involves not a descriptive or not just a descriptive research question but also you'd like to establish a relationship between the variables for instance the research question could deal with the impact of gender on political affiliation women more likely to be members of the Democratic Party compared to men or possibly you're looking at multiple countries as you try to figure out whether the level of violent protests is high in democracies compared to non democracies compared to authoritarian state this would be another example of a causal research question that involves two variables the level of violence is a dependent variable and type of a country where there is democracy non democracy isn't it dependent variable and finally another example let's say we tried to figure out whether schools in the rural areas I differ they offer different conditions compared to schools elsewhere and more specifically we are looking at the student to teacher ratio does it vary across the schools of different type in this case type of school will be an independent variable and we try to see whether it effects student to teacher ratio which we regard as a dependent variable here's what you need to know about testing causal hypothesis and by the way some of these techniques are very complicated fairly sophisticated you might need to take a course in statistics I will just introduce you to a very basic hypothesis testing without any references to statistical significance and more advanced methodologies so but at the very bottom of any type of statistical technique or course all hypothesis testing is comparison a very simple procedure comparison of multiple groups and here's what we do we divide our sample of observations into multiple groups according to the values of the independent variable for instance and trying to see whether there is a gender gap in terms of in computing the United States I have two variables one is income that is my dependent variable and gender is my ended so step number one I will divide the sample a set of observations about female and male incomes into two groups I will look separately at males and I will look separately at females and then I will compare the value of the dependent variable income of males and income of females here is the basic logic of hypothesis testing if the mean of infants or the means of income that I observe are basically the same approximately similar then there's no relationship between the variables it means that gender there is not impact there's not a fact income however if I observe that the female income is very different from income of males that would imply that there is a relationship between the two variables that would mean that gender influences if let's go back to the previous examples that are used does gender influence political affiliation to test a particular hypothesis for instance I'm a hypothesized males are more likely to be members of the Republican Party females I will divide my sample according to the values of the independent variable I will look at males that will look at females and then I will observe how many males and how many females are members of different political parties and I will compare these memberships I will look let's say the membership in the Democratic Party what is the percentage of males what is the percentage of females if that percentage is approximately the same then there is no relationship between the variables there is no jam together that's the logic of that test let's look at the violent protest in democracies the test for particular hypothesis related to this research question I would look separately at democracies and separative authoritarian States and whatever you measure is a violent protest I will try to figure out a typical measure of violent protests and democracies among democracies and then I will compare these two numbers if they are significantly different and definitely this class I will ask you to use your judgement statisticians use more specific ways to identify what is a threshold of significance when the difference become prominent so going back to this example if the levels of violence are much higher much lower in democracies then I will conclude that in fact type of political regime matters it influences the level of violent protests so that's the basic idea behind most additional testing procedures there is always a comparison of multiple of the patience how do you accomplish it a person like that using Excel let's talk about that I will introduce two procedures for you and the first one is known as comparison of the means he is when you use the personally means if your dependent variable is inflow and your independent variable is not interval it could be nominal or ordinal then the usage of comparison that means is appropriate these were this precision falls just like in my original example of gender and income we divide our sample according to the values of the independent variable will look separately at the income of males and we'll look separated females as a group and then we calculate the mean income for males and females as previously discussed the mean can only be calculated for interval variables and that's the reason why comparison of the means which is a very frequent procedure in all sciences can only be used with interval dependent variables but just think about how powerful this procedure is that is the type of procedure that is using from suitable industry to test for drug effectiveness if you have a study with some clinical implications and you're looking at smokers and non-smokers and particularly health effects once again you may use comparison of the means you will separate the look at smokers non-smokers and then you will a measure a specific outcome and see what there's a gap with is it a difference between the two groups so that's what comparison of the means involves okay now here's where fun begins let's use Excel to actually yes a couple of hypotheses and I will introduce this one research question for you and will use this command equal average ifs in Excel it will allow you I know it might look a little bit complicated and confusing but the procedure is quite straightforward it allows you to figure out the means of your dependent variable for different values of the independent variable and here's how the command structure you will once again have to type it in equal sign average is all capitals open parentheses and after that you will have to enter the range of cells for your dependent variable so start with the dependent variable for instance if you think the set contains variable income you will have to enter the cells for instance t1 13 19 with semi column in it with column in the middle then after that replace common and we entered the range of cells for the independent variable or is this sex or gender we place a comma and we tell Excel a specific value of the independent variable for which we would like to you mean for instance if males are coded as one I will place a comma and then I will type one and close parentheses he's what we have accomplished when once this command is completed we tell Excel calculate the average income for males only once we accomplish that the next step is to repeat this command but instead of males we have to enter a value for females for instance if female respondents accorded s to the last number in this command line will be two instead of one and then if cell will calculate the average income for females and we'll use a specific example we'll go through this procedure so don't be intimidated by the somewhat abstract description and after that what we need to do is to compare the two numbers if the difference is significant tangible then you have to conclude to conclude that there is a gender gap if the two numbers are roughly the same let's say the average income of males is $31,000 and income of females is $30,000 probably you shouldn't be confident claiming that there is an income the United States so let's use a specific example here's a research question that I have for you is the level of corruption lower in democracies compared to non democracies the world dataset that is posted on the web on our Moodle site contains the four variables that we can use the first one is free of corruption free underscore corrupt this will be our dependent variable it varies from zero meaning a very corrupt state through one hundred one hundred means no corruption okay so this will be our dependent variable and there is another variable democracy regime it has two values one means that a country is a democracy zero means that a country is an authoritarian state it is a non democracy by the way at this point let me remind you how I know what these values are because when you look at the excel website at the excel file you will just see a bunch of different numbers I have a Cold War it came with a dataset typically said PDF file as a separate file in all these silence of this class I will tell me what these encoding are but technically you can find them on your own if you are interested ok so these are the Excel commands that you have to type in into the command alive first of all there will be two separate commands first of all we have to figure out what is the original corruption in authoritarian states you have to type equal average is open parentheses and then tell Excel what the range of values for corruption a degree of corruption is and this is calling T so T 2 3 T 192 you will see that that klom contains our observations about corruption and you have to place comma and then you need to tell Excel what our independent variable is type of political regime these observations are contained in column H from cell to through cell 192 and that's what we type in H 2 column H 192 place comma and then since I would like to start by figuring out the level corruption for authoritarian States I places 0 0 in the data set identifies a normal democracy and I close parentheses and at that point you will have to click on enter ok we do this then we repeat and we the same procedure by the set of 0 we'll look at democracies so let me go to our Excel spreadsheet is how it looks you are already familiar with this dataset click anywhere outside of the base of any Cell should work and then go to the command line and type in command line guys it's under the menus you see a function function of X sign and then there's a line for your way you can actually type in a command and then enter the command equals average it's open parenthesis identify your dependent variable it is a wrong key you can verify that a sorority identify your independent variable type of political regime that is in row H then place a comma and tell Excel that I would like to see the level of corruption on average for non-democracies 0 close parentheses let's click on anthem and here's some mathematical magic and we have a result the level of non corruption on the 0 through 100 scale in authoritarian States is thirty point six eight excellent now let's figure the same number the level of non corruption for democracies all that we need to do is to repeat the same command but instead of zero enter one at the end of your formula just remind you one stands for a democracy in this particular dataset and click on Anthony now Excel returns a result the level of non corruption in known in democracies is 46.8 that's it these are the numbers that would have to compare right now in democracies the level of non corruption is almost 47 in an authoritarian State in a typical one the level of non corruption is 31 can we see that democracies are less corrupt than non-democracies well we observed a difference of almost 16 percentage points I use about per cell phrase percentage points because this is a zero through 100 scale so 16 points correspond to 16 percentage points that is a fairly significant difference had this number had this gap being let's say 5 or 3 percentage points then I would probably be uncomfortable claiming that democracies are less corrupt and non democracies but in this particular case we'll look at the 16 percent gap I definitely can say that there is a fairly substantial difference between democracies and non democracies in terms of the of corruption this is how the comparison of the meets procedure works if you have any other questions please contact me I will provide you with other examples also note that there are multiple examples of how to use Excel for calculating the means and other relevant statistics available these days on YouTube have a good day