in this video i will talk about frequency distributions so we're going to learn how to construct a frequency distribution including limits midpoints relative frequencies cumulative frequencies and boundaries we're also going to talk about how to construct frequency histograms frequency polygons relative frequency histograms and ogives so some of this i will talk about in a separate video but right now i'm going to talk about frequency distributions so a frequency distribution is a table that shows class classes or intervals of data with the count of the number of entries in each class the frequency which we denote by the letter f of a class is the number of data entries in this class in the class so if you look over here to the right i have a frequency distribution i have my classes and i have the number of data values that fall within each class so each class has a lower class limit and then they also have an upper class limit so the lower class limit is the least number that can belong to the class and the upper class which is the greatest number that can belong to the class the class width is the distance between lower or upper limits of consecutive classes the difference between the maximum and minimum minimum data rank entries are called the range so if you look at this um this in a frequency distribution the class width is the difference between the lower class limits so if you go from here to here is five you take six minus one is five you do it again six eleven minus six is five sixteen minus eleven is five so when you have a frequency distribution the class width has to be the same all the classes have to have the same width you can also do it with the upper class boundaries if you took 10 minus 5 here you'll get 5. if you took 15 minus 10 you'll get 5 and so on so how do you construct a frequency distribution well most frequency distribution distributions can be done with technology but i'll show you how to do it by hand so you can understand the concept so first you're going to decide on the number of classes and that's usually between 5 and 20 otherwise it may be difficult to detect any patterns then you're going to find your class width in order to find your class width you're going to take your smallest number and your data set and you're going to subtract it from your largest number that's the range okay so you're going to take the max minus the min and then you're going to divide that by the number of classes that you want so you might end up with a decimal but you round up to the next convenient number once you do that you're going to find your class limits you can use the lower uh the minimum data entry the smallest data value in your set as the lower limit of the first class and to figure out the remaining lower limits all you have to do is add the class width to it to find the upper limit of the first class uh you you'll do the same thing and i'll show you when i i do an example um the lor uh the upper limit of the first class is usually one below the lower limit of the next class so i know that sounds confusing but you'll see in a few minutes and remember your classes cannot overlap and then you'll find the remaining upper class limits then you're going to make a tally mark for each data entry in the row of the appropriate class and then you count the tally marks to find the total frequency of each class so let's go through a problem it says this data set lists the out-of-pocket prescription medicine expenses and dollars for 30 u.s adults us adults in the in a recent year and we're going to construct a frequency distribution that has seven classes so this problem tells us how many classes we should have so this is the data set so one person paid four hundred and five dollars out of pocket for uh prescription medicine somebody else paid 290. so that's what these these numbers mean how much they paid out of pocket from prescription medicine expenses so we know in the problem it told us that we want seven classes so to figure out the class width i'm going to take the maximum number which is 405 and we're going to subtract off the smallest number in the data set which is 155. i'm going to divide that by 7 and as you can see i get 35.71 so i'm just going to round that up to 36. that's got a whole number is easy to work with the next thing i'm going to do is i'm going to use the smallest data value which is 155 i'm going to use that as the lower limit of my first class okay so what i can do is i know that the class width is 36 i can add 36 to that 155 and that'll give me 191 and i just keep going i keep adding 36 when i add 36 to 191 that gives me 227 and i keep going until i have seven classes as you can see i have seven lower limits to figure out the upper limits what i'll do is you're going to take one less than the lower limit of this of the next class so i know this lower limit is 191 so that means my upper limit here must be 190. you just go one back okay and as you can see this lower limit is 227 so that means this upper limit is 226. but now that you have your a couple of the upper limits you can just keep you can add 36 to the uh which is the class width so if i keep adding 36 if i add 36 to 226 that gives me 262. add 36 to that that gives me 298 and so on so now i have all of my upper limits so basically i have my classes and now what you do is you're going to go through the data set and you're going to determine which data values fall within each class so if you look at this first class 155 190 i have three tally marks because three data values fall within that class so if we go back to the data set i know 155 falls in there 165 and we got a 168. those are all the data values that fall within the first class if you look at the second class i have two tally marks so that means two numbers fall between 191 and 226. so if i go back to my data set i have um let's see i should have 195 and should be another one 200. those two values fall within 191 to 1 226 class and you just keep going so as you can see i have my frequencies over here to the right and your frequency should add up to the number of data values that you have so we had a total of 30 data values and that's what this frequency column will add up to it should add up to 30. so now that we know how to determine the the frequency distribution we're going to calculate some other things we're going to calculate the midpoint the midpoint is the lower class limit plus your upper class limit divided by 2. so if we go back to that uh first first class and that i'm actually going to do a whole one for you um the the class went from 155 to 190. we're going to add those two numbers together and divided by two we get 172.5 that's the midpoint of the first class that's the middle of that first class we can um since we already know the class width we can just add 36 to the next midpoint so if i take 172.5 and add 36 i get 208.5 that is the middle of the second class and i'm going to show you a a whole problem on that next we're going to talk about relative frequency of a class and that's the percentage of the data that falls in a particular class so what you'll do is you're going to take the class frequency the number that falls within that class and divided by your sample size and that's it the total number of data values that you have okay and then the cumulative frequency is the sum of the frequency for that class and all the previous classes the cumulative frequency of the last class is equal to the sample size so we're going to use the frequency distribution in the previous example we're going to find the midpoint the relative frequency and the cumulative frequency of each class and describe any patterns so i'm going to do the first five classes and then i'll show you the expanded version of it alright so the first thing we're going to calculate is the midpoint and i already told you how to do the midpoint so if we do this first class is 155 plus 190 we're going to divide that by 2 and i get 172.5 when i do it for the next one i take 191 plus 226 divided by two we get two zero eight point five now as i said in the previous slide we already know that the class width is thirty six so i don't have to um go through and add the lower and the upper and divide by two what i can do is i can just now take 36 and add it to each one of these midpoints so like i said 172.5 is the middle of the first class 208.5 is the middle of the second class so i'm going to add 35 to each one of these class limits to figure out the midpoints for the remaining classes so all i'm doing is the first five so those are the midpoints now we're going to talk about relative frequency we know that we had a total of 30 data values in this set so the relative frequency or the percentage of values that fall within this first class is going to be calculated as 3 we're going to take the frequency and we're going to divide it by 30. i get 0.1 i do the same for the next one i'll take 2 divided by 30. i get 0.07 i keep going i believe the next one was 5 divided by 30 i get 0.17 6 divided by 30 gives me 0.2 and then 7 divided by 30 gives me 0.23 now we're going to talk about cumulative frequency i'm going to write this a little better cumulative frequency is when you um take your first value and then it just accumulates so the first value the first frequency is three so the cumulative frequency for the next one would be three plus two which is five and then we'll take the next one three plus two plus the five we get ten and then we'll take three plus two plus five plus six we get sixteen and then three plus two plus five plus six plus the seven gives me twenty three all right so now this is the expanded frequency distribution below this shows all the classes remember i only did the first five classes so when you add up your frequency column you should get 30. when you add up the relative frequency that should be one and if you look at the cumulative frequency the last data value or the last number in this set should be the total number of data values that you have and we had 30 so the last number should be 30. and from this we can determine some patterns so for instance the most common range for the expenses is 299 to 334. because if you look here this has the highest frequency so most people paid out of those 30 seven people and that was the majority of the people paid 299 to 334 dollars out of pocket for uh prescription expenses and then we can also see about half of the expenses are less than 299. so if you look here this is 299 if you count from here to here you have about 11 12 13 14 15. you got 16 here and from here to here you get 14. so about a little bit more than half paid less than 299 dollars so that's why you use the frequency distribution because it's better to see your numbers in table rather than just looking at a big old data set so if you go back to the data set you're not going to be able to determine any patterns but if you put it in tables charts graphs then you're better able to see some type of pattern