hello and welcome to today's lesson on sampling design this is our first introduction into working with surveys and experiments okay so when we think about this whenever we're thinking about a survey you know you had your two questions that you had to go through and gather data on yesterday you know how do we gather data we can use surveys opinion polls we see those all the time you know on Twitter on TV in the newspaper interviews studies can be an observational study a retrospective study so you're collecting data on things that have already happened so you're just going going back into the anals and collecting data or a prospective study where you're looking at things in the future and then lastly we can do an experiment to conduct and gather information okay now we have a lot of terminology that we have to get through to start this unit to start our work in statistics okay the first thing is defining the population and it's the entire group of individuals that we want information about it doesn't have to be the entire population of the planet okay it can be the population of a school population of a state okay it doesn't have to be people it can be things but it's the entire group of individuals that we want to gather information about when we do a census the United States does a census every 10 years and it's a very lengthy process C and it's a complete count of the entire population so you're gathering information about every individual when you conduct a census why would we want want to use a census all the time okay not very accurate okay there is a lot of error in senses and we'll talk about bias and different types of errors that come into place as we go through this lesson um um but it takes a long time and it's almost impossible to get every individual you know doing a sensus is very expensive okay you have to use the entire population okay trying to get all that information hiring people to do that work you know testing all the product okay and in some cases it's impossible to do a census okay we want the average weight of a white tailed deer we're not going to be able to gather every deer and weigh it um if we're testing every vehicle for crash safety rating we would have to crash every car made by that one automaker okay and that's what we're looking at here in number four destructive sampling you would destroy the population breaking strengths of soda bottles lifetime of flashlight battery safety ratings for cars in all of these situations you would destroy or lose the product in that testing process so we would not want to do a census in that case so then what we want to do and this is the basis of almost all statistics is we want to gather a sample of the data sample is part of the population we actually examine in order to gather information overall okay weuse use the sample to generalize about the population so this is making an educated guess based on the information that we can gather from a small part of the population so now when we're doing our sampling design there are a number of ways to design your sample we want to make sure that we're doing it accurately and properly okay so we're using a method to choose the sample from the population and there's different ways to do this some are better than others but we're going to look at those different possibilities here today okay your sampling frame is the list of all individuals in the population okay so for minaka high school we would list out every individual from the high school and that would be the sampling frame for this class I could just pick the class list and that would be the sampling frame okay but it is a list of every individual in the population so when we want to do a simple random sample this is the most common process that's used okay doing a simple random sample okay it's a situation where every set whatever your size is of individuals from the population is chosen in such a way that every individual has an equal chance of being selected okay and that's the important part of a simple random sample is every individual has that equal chance of being selected as a part of the process okay so if we're going to take a simple random sample of 100 students put each Student's name in a hat and then we just randomly select 100 names from the Hat okay each student has that same chance of being selected and what we want to remember here here the most important word and you're always going to have to use this word to get full credit is the word random okay we're doing a random selection from the names in the Hat okay and then every set of individuals has an equal chance of being selected so we're looking at that same situation every possible group of 100 students has the same chance of being Ed Okay so we've got so we have to understand some of the limitations though of a simple random sample okay since it is random it's possible that all 100 students chosen from the high school are seniors so now some different types of sampling that we can do if we want to make sure that there isn't a possibility for all 100 to be seniors what we can do is what we call stratifying okay divide the population into groups first and that's the important part the division happens before we random sample okay so we have males and females freshman sophomore junior senior okay there's a lot of ways that we can stratify you just have to define the characteristics that you're dividing the individuals by and then we do a simple random sample pulled from each of the strata so if we want to take a stratified random sample of 100 MHS students okay divided by grade level okay we can randomly select 50 seniors 50 Juniors if we want an upper classman survey okay if we want a whole school survey we would randomly select 25 students from each each grade to get our 100 student sample okay this one isn't used as much but we'll talk about it really quick a systematic random sampling okay this is something that we see happen once in a while okay we're going to follow a systematic approach but this is the most important part it has to be randomized somewhere so what we have to do is randomly select where to begin our sample okay so if we want to do a systematic random sample of MHS students okay and we got to update these numbers we're up to about 3200 overall okay so if we want a sample of 100 okay we need to select every 20th student from the group select that random number between 1 and 20 okay that gives us a starting point that's random instead of starting with a specific individual and then that ensures that each individual has the same chance of being selected cluster sampling cluster sampling is something that we can do for convenience and it's something that's used often okay based upon a location okay we have some specific locations that we want to know about okay so we take the locations we randomly pick a location and then sample everyone in that spot so we could take all the math classrooms and randomly select a classroom and then sample every individual from that room overall okay and then we've got a multi-stage sample we're almost done here you know I know this is a lot of information early on okay selecting successively smaller groups within the population in stages so we randomly select at each level so we could start with the entire United States and then randomly select a few States and then within those States we can randomly select a few counties but you have to use a simple random sample at each of those stages okay so we divide the period two classes by level randomly select four second period classes from each group then we randomly select five students from each of those classes okay so this is an important piece of getting down to that smaller individual based on a multiple process of selecting the individuals okay simple random sample Le okay we want to understand that there's advantages disadvantages to all of these processes the advantages of a simple random sample okay it's an unbiased estimator okay so it's not affected by areas of bias and we'll look at some of those overall okay easy to do the disadvantages large variants okay so from sample to sample they can be very different and it may not be representative of the overall group okay and you have to have a list of the entire population to be able to do a simple random sample a stratified sample the advantages more precise unbiased estimator because you're making sure that you're getting individuals from each group less variability okay the cost reduced if the strata already exists the disadvantages difficult to do if you must divide the stratum and then the formulas for calculating standard deviation and confidence intervals are more calc are more complicated so when we're looking at the calculations the more levels that we add in the more difficult the calculations will get and you still need the entire sampling frame systematic random sample advantages unbiased okay ensures that the sample is distributed across the population okay it's an efficient process because you just can go through your set count to get the individuals you don't necessarily need the sampling frame the disadvantages you can have a large variance okay and it can be confounded by a trend or cycle okay so if there's a trend of how people come into the building and you're sampling every 20th you might miss some groups based on those pieces and then once again we're adding in multiple pieces the formulas become more calcul or complicated for our work cluster sample advantages unbiased cost is reduced sampling frame may not be available it's okay we're selecting by area disadvantages the Clusters may not be representative of a population you know if we sampled this this class as a cluster this might not be representative of the entire school you know because we're all seniors and juniors okay and it might not be representative of students of all level based on this being an AP class okay so we're looking at the sampling design here we want to be able to identify what type it is first divide all colleges into groups of similar types and then random select three colleges from each group okay that's stratified okay remember they divided schools up first they found the strata first and then sampled and that's what makes it a stratified random sample okay looking at this one randomly selects blocks in her district and then surveys all who live on those blocks so she randomly selected an area and sampled everyone that is a cluster sample every 10th customer as soon as you see that you should think systematic and then what we can do and we don't do this as often anymore is using a random digit table to do our selection process okay but it's something that we want to you know just look at and know and understand each entry is equally likely to be any of the 10 digits digits are independent of each other so we can have rows and what we do is we go through a systematic sampling process to select the individuals for a given survey okay we can read in any direction up and down side to side diagonally okay and these digits from you know 0 to 9 are equally distributed throughout this process okay so if we wanted to select from this group we've numbered everyone okay because they have to all have an individual label and we're going to use the random digits to do this selection process okay so we're going to start with Row one reading across since some people have two-digit numbers we have to give everybody a two-digit number so Aiden would actually be 01 Bob would be 02 okay we've got 45 not in the list 18 we've selected that person five 13 71 we leave out one ignore ignore okay 15 okay and then we stop when we have our five indiv indviduals okay now our last piece for today is introducing areas of bias okay so how can bias be introduced into a survey design no matter what type of design it is we have specific types of error that come in that affect or favor certain outcomes in the situation okay so you can see there are a lot of different ways where we can introduce bias okay so now sources of bias things that can cause bias in your sample okay we can't do anything with bad data voluntary response this is probably the number one area of bias in any survey or any opinion poll and based on the term you can pretty much guess what it is people can choose whether to respond or not when this happens we're going to get extreme viewpoints if a person's just in the middle they're not as likely to call in but if you're really against something really in favor of something then you're going to volunteer to respond okay so we can see these examples you know all of your call-in shows you know American Idol America's Got Talent you know all of those select themselves to participate okay make sure did they select themselves okay any product review is a voluntary response people are volunteering to respond to a product review you're going to get your extreme viewpoints convenience sampling asking people that are easy to ask these two almost go a little bit hand inand okay because a lot of voluntary response is also a convenience sample okay you know asking the people in your classroom most of your survey information that you gathered for that day one experiment was a convenience sample you were just asking people that were easy to get to okay can produce bias respon response because it can fall right into a voluntary response but here you're getting information from you know easy sources and that can change the results you get okay under coverage okay when we look at under coverage as a bias piece who is not being represented in the sampling process you know if it's a phone in pole you know people that don't have telephones if it is a newspaper or TV poll people that don't read the newspaper don't um watch the TV show are being undercovered in that sample you know so here's some examples for the phone in polls you know unlisted numbers people without phones okay so you're losing some of those groups okay nonresponse this is always a choice that an individual has they can choose to not respond to a sample okay and were losing their information okay they were chosen but they refused to participate okay this is not self- selection okay so if you walked up to somebody and asked them the question and they chose not to respond that's nonresponse okay it's not opting out of a voluntary situation okay so you have to understand they're chosen but they refuse to participate for nonresponse you know so when you look at the telephone surveys you know most people don't even answer their phone anymore if they don't notice the number you know that is a nonresponse situation okay and then we're making the followup you like the Crickets in the background I don't know if you can hear that but uh somebody let some uh crickets loose in the school so that's a a fun first couple of days we've got here okay and then response bias people responding in a nature that they think they want to hear okay so occurs when the behavior of the respondent or interviewer causes bias in the sample a response bias could be you know if you're asked by a police officer about an illegal activity you know there was response is going to be biased because you're talking to a police officer and you wouldn't want to incriminate yourself okay so we just talked about this you know a uniform police officer asking a class about drug abuse okay we would not get honest answers in that situation okay the wording of questions can be a response biased piece okay you know based on all the shootings in Chicago should we have more gun control laws you know you're leading people to a specific type of question okay so you have to make sure your questions are worded as neutral as possible this is something that you want to look at at your survey questions okay you know the use of Big Technical words can introduce bias because people people just not understanding the term you know would affect how they respond to the individual piece okay so you have to think about your group that you're working with you know I always drive by the sign and you know it seems a little funny to me Nimrod Minnesota okay you know you want to be careful of your vocab ulary there of surveying doctors you can use more complex technical terms okay so let's look through at a few situations here identify the source of bias and then we're wrapped up for the day here okay so before the presidential election survey of 10 million people predicted that Roosevelt would win the dagis survey came from the magazine subscribers okay so what type of bias do we have there okay under coverage okay if they're only surveying their readers okay we wouldn't have people from other groups okay what type of bias do we have here okay register receipts from students as they leave the bookstore during lunch one day okay it's a convenience sample okay you sampled the easiest way okay and undercoverage you know the fact that they're leaving lunch did they not eat lunch that day those would be undercovered did they buy their books in another way they would be undercovered okay average value of a home in minona one averages the price of homes that are listed for sale with a realator okay this is under coverage okay houses that are not for sale okay wouldn't be included in that calculation okay and that ends our lesson for today okay thank you and have a good day