Transcript for:
Descriptive Statistics with Excel Overview

okay welcome back this week I would like you to practice with some basic descriptive statistics there are multiple powers that you can use the calculator descriptive stats and also the test causal hypothesis we'll deal with that or legislated as well some of these professional programs available to us as well here at NC State for instance spss statistical package for the social sciences stated SAS but the semester I decided to use Excel because this is a program that is widely available and you if you don't have it on your home computer please contact me you can access any software including excel by logging into the university system by means of the so called virtual computing lab VCL not NCSU dot edu so access to excel will give you quite this week and the following week hey once again get in touch with me if you don't have excel if you need to figure out how VCL works but right now before i introduce you to how you used to come to use excel for figuring out basic descriptive stats let me describe a format for a typical data set data sets are collections of variables and when you open data set in any program it could be spss it could be so typically what you should expect there will be a meters of rows and columns each column typically there are some exceptions from that rule but typically each column with constitute and variable and here I have bathe yourself for you with individuals as individual respondents as a unit of analysis let's think of it as a an example but Punky public opinion survey so we have three columns color competing actual data sex age and support for gay marriage and that's what you will see in those data sets each row will be a set of observations for one respondent of course if your unit of analysis is a country or a community that each row will be a set of observations for that particular type of an organization country or whatever your unit of analysis is so in this case we have several roles as well the number of rows to response to your sample size in this example we have Rho 1 Rho 2 and row and corresponding to the last observation of the last respondent what also you should expect to see in a data set is a bunch of numbers when rows and columns intersect we enter a particular number that encodes a piece of information about that individual most businesses contains separate PDF files known as code books and that's why where you can disagree you need to consult is code books to figure out what these numbers mean and all the home assignments have tell you what these numbers imply because I have actually access to the code books and I decided not to overcomplicate the assignment but remember if ever you work on your own and you find a data center you would like to analyze it oftentimes things are not self descriptive but good things that is always accompanied by a cold boy and that's what what you need to look at to figure out or the numbers apply in this case age is fairly self-explanatory you can see that respondent was 55 years old responded to his 23 respondent and his 46 years old what about janitor we'll have a bunch of numbers ones and zeros in this case if you can saw the code book that's where you can find out where the one stands for male and zero for female maybe zero is male at the one it's female let's say that one is male so what we know about responding to it is that she is a female 23 years old and support for gay marriage once again the code book will tell you how this bill is operationalize what the scale is what the original question was let's say this is a 1 through 5 like a scale and ordinal scale that can be assumed to be almost eatable 5 means high levels of support 1 means complete opposition 3 means indifference no particular stance on this issue so we can see that respondent number 2 supports or exhibits high levels of support for gay marriage respondent number 1 is somewhere in the middle so that's main responder 55 years old and he is somewhere in the different on the issue of gay marriage so that's what a typical data said would appear to you when you open it in a program like Excel so let's practice a little bit and I have an assignment wave let's open the so called wouldn't be intercept I applauded this video sent to our website and as a variable in this dataset free of corruption freedom of corruption he theories let me tell you how its core it if there is from zero which means high levels of corruption through 100 100 means very low levels of corruption it implies that a country is basically free of corruption now let's together calculate the average the median levels of corruption for the world also standard deviation and the mode in Excel and I'll use examples using 2010 version of Excel there are particular formulas that we have to use to figure out different statistics you can also click your way but I'll use an example with the natural syntax and you can copy and paste it into you Excel problem as well and here's how things work out and let let me explain first of all please log on to the Moodle website and open I'm already here go to Excel data sets if you clip well it's all Davis says you will see a file here world to the world dataset as a collection of variables about different countries in the world when you double click on this file assuming you have a cell on your computer that's what you should see it might look intimidating a little bit guys don't worry you'll find you're okay that's how Davis has appear as we discuss know that there are multiple rows each row corresponds to one country and we have approximately hundred 90 countries in this particular sample of countries of the world Burundi Cambodia 32 is Canada 37 is China so on and so forth one of these countries is the United States and each row contains observations different characteristics or records for this particular country each column is a variable and I can tell you what these variables mean if you're curious most of this information by the way comes from open sources such as CIA world book or the World Bank data set and or enterprise institution witches I think that here in the United States so check out road team that is our non corruption variable for each country we can see that China for instance 37th in terms of this variable it is at 36 that's the level of corruption in China the scale wasn't gonna zero through 104 bali-ba 22 I see that believe is more corrupt than China according to this information it is at 30 and what is 70 70 is Barbados Southeast and Barbados is definitely much less corrupt than China and austere variable observation 11 when I would add the level of non corruption is an 81 so Austria appears to be a fairly North corrupt country so that's how you can look at baby you can look at specific countries but sometimes we would like descriptive stats sometimes we would like to summarize what the levels of corruption are for the world in general well he's what we need to do let me show you how this can be done in Excel for so let's scroll down the bottom and select any free cell and then click in the formula bar and enter the formula for the average that I typed for you in the PowerPoint presentation and I simply covenant pieces at you so here's how the funnel is the center's equal sign then using all capitals average open parenthesis and in the parenthesis you have to enter the range of values for which you would like to calculate the average so this is roll the key we start with cell number two this is the first country T two then you need to type column and the Ola has observation if you look at it Zimbabwe is 192 so the one cell is T 192 and we have to close the parentheses so that's the status it might appear to be once again a little bit confusing intimidating don't be afraid of that by the way that's how people use the program and request information from the computers these days oftentimes we simply click our way through different menus but technically that's what we need to do in Excel if you click on okay SPSS returns a number forty point for 195 for basically for it that's that's the mean that's the arithmetic mean let's figure out the median and I simply click on a different cell as a matter of fact I would enter the same formula but instead of average let me replace average with the command mean that's it that's all we need to do mean open parentheses and once again in the parentheses you need to indicate the range of values for which you would like to calculate the mean in this case we stay with the same variable T 2 3 T 192 well click on OK oh and there's an error message so let's troubleshoot let's see what went wrong in this particular case it's not even close to the paranthesis oh wait a second I was the wrong command tomatoes medium not get that explains it oh here we go the median is 34 it's okay to make a mistake sometimes and by the way typically I don't really use Excel but that's the kind of program that is really available to us so it's a nice skill to have so 34 immediately we can see that there is a positive skewness the mean the average 40.4 exceeds the median level of corruption which in this case by the way means that they dispositive eschewed there are some countries with very low levels of corruption a high number corresponds to a low level of corruption what is the mode the command for the mode is mode let me enter average and the command for the mode as you can find in our PowerPoint presentation is dot s mg/l that's the command for the mode and it's 19 so the most common observation the most frequent category in terms of corruption is 19 fairly low very low but because of this scale and finally standard deviation again the process is somewhat dependent events simple in that sense let's figure out a standard deviation for these numbers for this variable and the command here is as PD dot s by the way Excel suggests some of the different formulas very different lots of options this is the one that we use standard deviation not as s stands for sample here standard deviation in the sample I click on OK and it is twenty point eight nine and this is something that you can report you can say that corruption is measured on a zero to 100 Skinner 100 applying north corruption zero means very high levels basically absolute corruption on that scale with a median level of corruption is 34 that mean or average is 40 point four and the most common frequent observation is nineteen standard deviation is twenty point eight one more observation when you open a data set as this one you can see the so called null messages lots of them but the highlighted is one of these cells null means that there's a missing value that's how an Excel deals with the missing values if we were to use a different program like SPSS he would simply see an empty cell so this is a particular way of identifying the missing values in itself next time we'll talk about testing causal hypothesis for instance if you were to to compare democracies to non Demong how do we determine whether democracies are less corrupt and possibly more corrupt than non democracies I'll teach you a couple of procedures for testing such basic course of hypothesis next time