Transcript for:
Statistics Lecture Notes

okay so welcome back uh today is uh Tuesday tomorrow's Wednesday the last day or last class day for the week Thursday is uh Fourth of July so there'll be no lecture on that day uh whatever is completed by tomorrow that is what will be that will be all for this week and therefore for the uh fourth exam uh I'm a little bit behind on the videos I plan to catch up tonight uh editing the videos bit taking longer than expected like up to eight hours for a class so I'm going to change the strategy cut it down to just a couple couple hours but it won't be as thorough as I prefer it to be so we'll just uh pick things up where we left off from yesterday so uh chapter 11 section five and this is where we were yesterday expected values can be uh a useful tool for evaluating uh investment opportunities so as a review of expected values so you have uh a whole bunch of values and each one has a corresponding uh probability so these are all possible outcomes these are their corresponding probabilities and since these are all the possible outcomes all of these probabilities each one should be greater than or equal to zero and their sum should equal precise ly one and so with all of these in mind your expected value is going to be X1 * px1 + X2 * px2 uh plus X3 * P X3 all the way up to xn * p xn so it is kind of the closest thing to an average within the realm of probabilities so going back to the evaluation of uh investment opportunities Mark is going back to invest in the stock of one of two companies Below based on his research a $6,000 investment could bring him the following returns uh company ABC company ABC uh company PDQ okay uh profit or loss X so loss $400 the probability of that happening is 0.2 uh $800 profit with a probability of 0.5 uh $1,300 profit with a probability of3 and for the other company uh $600 at8 and $1,000 at uh happening at 0.2 or with the probability of 0.2 what we're going to do is find the expected value of ABC find the expected value of P DQ and then compare the two so find the expected profit or loss of each two stocks so solution if we look at ABC we have the -400 * .2 uh 800 * .5 1300 time3 to give us 710 uh PDQ has 61.8 and a, B point2 that gives us an expected value of so in the long run assuming that these probabilities and these uh stats hold out in the long run we will expect like a $710 profit from ABC uh and $680 profit from now if any of these expected values came out to be negative then that would be a loss uh notice that even though ABC uh part of its prediction includes a loss a rather significant loss it still comes out ahead so I should say consider to invest $6,000 in a BC and in V uh business and insurance expected value can be used to help make decisions in various areas of business including insurance and I'm pretty sure that the insurance companies use this uh notion in quite a bit so Lumber wholesaler is planning to purchase on a load of lumber he calculates that the probabilities of reselling the load for $99,500 $99,000 $88,500 are 0.25 60 and 0.15 respectively in order to ensure an expected profit of at least $2500 how much can you afford to pay for the load so the question is how much can you afford to pay for the load so so the expected revenues are as follows so there's our X there's our P of X and in the third column we that is the product of the first and second uh uh column so add up that last column and we have an expected revenue of 9,500 but that's Revenue remember that this is a a business and one of the most basic uh understanding about business is expected profit is equal to the expected Revenue Minus cost now uh this we have this we have this is what we need to calculate so we're going to rework this equation so we add cost to both sides subtract expected profit from both sides and we have this expected profit of of at least $2500 so we could change that to greater than or equal to so our expected profit is to be $2500 so whatever our expected Revenue minus expected cost we want that to be $2,500 and had we added the cost and subtracted the expected cost inequality would still be pointing in the same direction so to have an expected cost uh he can pay up to so the expected Revenue that's from the table that we looked at the uh that'll be the $9,500 the expected profit that's from the uh original condition of the word exercise so taking the difference we have 600 U 6,550 so conclusion if he pays at least $6,550 he can expect at least a profit of $2500 if he pays costs of at least 6550 he can expect at least a profit of $2,500 an important area within probability theory is a process called simulation it is possible to study comp ated or unclear phenomena by simulating or imitating it with a simpler phenomena involving the same basic probabilities simulation methods also called Monte Carlo methods uh require huge numbers of random digits so computers are often used to produce them so there's your simulation methods also called Monti Caro methods require huge numbers of random digit so computers are often used to produce them simulation using random numbers to mimic to mimic a phenomenon and question toss two coins 50 times and US results to approximate so it's a simulation so it's not going to be an exact duplication of the of the phenomenon but approximate the probability that the crossing of rnp plants uh will produce excessive red flowed Offspring so this goes back to the whole genetics exercise from earlier in the chapter and so capital r was the uh dominant gene for a in this case color so red and little r is the recessive gene or color so if uh if you had an offspring that had both r r then the color would be red uh if it was r R little r or little r r uh these would still be red since capital r is the dominant gene little r little r would generate a white if I remember correctly a white Offspring now the RR would be sure to uh pass on the dominant gene to their offsprings but the if if the if the plant is R little r no matter what the ordering is then there's a 50% chance it'll pass off the dominant a gene so solution we actually tossed the two coins 50 times and got the following sequence uh Tails heads Heads Tails etc etc etc so they're broken up into pairs so heads represents the dominant R Tails represents the recessive R that's why we break them up into pairs notice you have two coins you're tossing two coins 50 times so we have 50 pairings here in this picture by the color interpretation described in the text this gives the following sequence uh an offspring note that only both tals gives white since I head represents a uh what do you call it the dominant gene so anytime you have a head it's going to be red so that's going to be red red red uh tail tail white red red so here you have two dominant so red red red red red red red white so essentially we have this for the breakdown now we have an experimental list of 48 sets of three successive plants the first second and third Generations then the second third and fourth generations and so on you see why there are 48 eight and all so this is the first this is the second this is the third this is the fourth this is the fifth now we have an experimental set of the first set then the second um of three successive plants so three successive plants so uh we go so there's one succession there's the next succession there's the next succession now by the time we get to the last ones here we have run out of successive Generations so the this one cannot be tied to a group of three this one cannot be tied to group of three so we start off with 50 but we can only tie 48 to a group of three successive generation so do you see why there are 48 in all now we just count up the number of these sets of Threes that are red red red since there are 20 of those are empirical probability of successive three successively red Offspring attain through simulation is 20 over 48 or about 417 so red red red there's one red red white not one red white red not one white red red not one red red red yep that's two red red red three red red red four red let see red red red five etc etc etc there's 20 of those 20 out of 48 so that gives you 5 over 12 or about 417 this is kind of tricky so if you want me to break this down further I'm willing to generate an extra video to try to break this down even further so let me know uh simula with random numbers so a couple plans to have five children use random numers simulation to estimate the probability that they will have more than three boys solution for each sequence of five digits as they appear in table 17 uh page 674 from the text uh represents a family with five children and arbitrarily uh associate odd digits with boys even digit with girls recall that zero is considered even uh verify that of the five 50 families simulated only 10 only the 10 marked with arrows have more than three boys that is four boys or five boys okay remember the original question estimate the probability that they will have more than three boys there's your question what is the probability of having more than three boys so here it is so if you notice that there's this uh on the left hand side there's this column of uh five-digit numbers those aren't really five-digit numbers those are really uh singled digigit numbers grouped into group groups five uh here's the arrow so notice that some of them have arrows next to them to the left of them so there's your let's see one two three four five six seven 8 nine 10 arrows so we'll take a look at the top number here so 5 1 5 92 so that's not to be read say 51592117 this is going to represent a OD this will represent a boy this is odd so this will represent another boy again odd so a boy again odd boy this one is is even so this will represent a girl so the next one with an arrow has uh is five3 zero uh three then three boy boy this one is even so girl boy and boy so the first two that have arrows uh highlighted each represents uh four boys and one girl why why that Association why not that is the easiest way to break up I mean you have five digits available no you have 10 digits available so you can easily split it up into two groups if you split up into odd and even you'll have five odds and five evens available so they can it's easily broken up into smaller but equal-sized subsets and remember this is just a simulation so the idea is that you would uh now how you get these that's that's a different deal uh Again by calculator there uh resources online where you can generate that will Generate random numbers or as best as possible but if you're just dealing with numbers then holy cow now you have to how you're going to convert those digits into the this whatever thing you're studying here we're talking about uh family with five children or five babies and then you're going to work out a way to convert each digit into a a boy or a girl so there's actually 50 sets uh 50 groups of five digits and arrows does represent those where you have uh more than three boys represented and that is where this probability comes into play so the probability of more than three boys is equal to there's 10 of these groups of five digits that represented more than three boys so uh four boys or five boys out of 50 groups of five individual digits so Point uh 10 over 50 equals. 2 and that is the probability that you are looking for so that completes uh not just that section but uh chapter 11 so chapter 10 counting chapter 11 probabilities and now the next chapter statistics now the these are statistics show up uh practically in many places so might as well get have some idea of what's going on when statistics are being used so for this first section we'll look at visual displays of data so graphs and charts that kind of stuff so construct and interpret uh data displays construct and interpret frequency and group frequency distributions uh construct and interpret stem and leaf displays I don't know why stem and leaf are i' I've seen them before but they're not very common but hey um the author wants to bring it up so I suppose uh if you see it you'll know what it means uh construct and interpret bar graphs Circle graphs and line graphs so population so here are some basic concepts a population includes all items of Interest that's it this is kind of equivalent to the sample space and probabilities but not entirely and within this population a sample includes some but ordinarily not all the items in a population so population Universal set a sample is essentially any set set of that population sometimes the sample is the population and that would be an example of a census you know one common thing in this arena is like polling so the population would be some portion of the US population portion like uh age uh 18 years uh through uh 25 years or party affiliations so like all Democrats uh the population could be the the s or the uh population of California or of Texas it doesn't have to be the whole country it could be of a state now each of these portions have like tens of millions of people so if you want to pull all those people that's quite impossible so what you'll do is you'll generate a sample from that population and you're going to uh study the sample because the sample will be the standing for the population it will never be a perfect standin for the population but at the same time when you do when you do any kind of study you're gonna have a Time investment you're going have people investment you have money investment and other types of investment and the larger the sample it is okay well now you're going to have you're going to have the larger the sample is the greater the need to uh have very accurate records of what you're Gathering how are you going to generate these samples well there are different ways of doing it and if you take a stats class you can learn how all of that uh or have some basic idea of how to do all of that so the sample stands in for the population though it is rarely uh perfect there'll never be a perfect sample from a population you want to be perfect fine do the whole population but remember the larger the group it is the more time it'll take to uh to studied that group and if you're the only one holy cow you know if you're dealing with the whole population now the population doesn't have to be a big group suppose you uh uh wanted to study all the astronauts who went to the moon there's only been like a dozen people who've been and walked on the moon so like a dozen so there's a an example of something that would have a small population and in that case h you know generate sample with not it depends whether you might be forced to do a sample because if part of your study is to interview these people well some of them have already you know moved uh moved on Beyond this uh realm and there aren't that many uh astronauts who walked on the moon that are still alive so anyway so the population that's the group that you want to study uh it doesn't have to be people uh the population could be holy cow I want to study all bears what you know what kind of uh food what kind of diet do bears have what what are their General measurements their General weights you know that kind of stuff um so sometime know you can't you can't uh study all bears so but you can try to get a sample boy if I only had like say maybe 20 samples you know you go out you tranquilize them and you take your measurements okay and then kind of dig into their stomach to see what what their uh contents is like so you have a better idea of their diet and uh take measurements I'm not sure how you going to weigh a bear [Music] but your study could be uh Hey I want to study uh how polluted a particular lake is well I mean you can't study every gallon inside side so what you're going to do is you're going to take certain samples well where you going to take those samples where are you going to take random samples at random or samples at random spots throughout the lake there is your sample so it goes on and on and on bottom line population that is all the items of Interest a sample is some subset of that population it's the sample that will that will that you will be studying it represents the population and you try to get it to represent as best as you can but it will never be a perfect representation okay basic concepts information that has been collected but not yet organized or processed is called raw data it is often quantitative or numerical but it can also be qualitative or non-numerical quantitative uh like uh certain examples would be like you know height weight blood pressure time any kind of time measurements uh group size uh any form of measurements lengths qualitative uh so softness party affiliation eye color that cannot be described in a numerical sense yes so once you put a number to it well with numbers comes order so you you're automatically saying oh I can order these things well how do you order eye color I mean brown green gray blue who's to say that one color is better than the other I mean you could say that but that's more of a personal expression rather than a universal expression so descriptive statistics consist of collecting organizing summarizing presenting data or information and short describing the data these descriptions can apply to populations or to samples uh inferential statistics consists of drawing conclusions making inferences about populations on the basis of informations from sample taken from populations so descriptive statistics uh you know often graphs are involved in this describing the data uh bar graphs and we'll we'll get into some of those inferential statis consist of drawing conclusions making inferences about populations on based on the information taken from samples that are taken from the population so you have your population and from that you have your sample while you study the sample you get information but remember the sample is a representation of the population so the information is inferred back onto the population like I said the sample is never a perfect representation of the population and so because of that the information is never a perfect uh description of the population so margins of [Music] error so if your population says that you know the average height of the population of the sample lies here then when applying that information to the population you can't say oh well everyone the the whole average population because um because you haven't counted everyone you've only counted the sample so what you do is you add a margin of error to say okay I am confident I'm 95% confident that somewhere in this region the population average height is somewhere in here in this margin of error we won't dig too deeply into that you want to do that that's what the uh fundamental stats courses the stat 300 is for but this is just to give you an idea of what these inferences inferential statistics uh usually you have a with the polling uh this is probably the most uh the easiest thing because it's so uquid as so much coling going around going around so the polling takes like a a sample from the population usually it's about 1,500 12 to 1500 people the larger a group you have the more you're able to have it represent the population but the larger it is the more expensive it is TimeWise peoplewise and MoneyWise to study that sample but somewhere around 00 uh if you make um if you make the sample size of larger you're not going to gain that much more accuracy in your information so yeah so generally 15 12 to 1500 people so polling you know 47% are in favor of of I don't know Rocky Road ice cream you know they'll present it the P they'll give the percentage but often the fine print it will include a margin of error plus or minus 2% and sometimes they will also give the sample size like pulling so whenever you see this 2 % plus or minus 2% 3% 1% uh this is your 95% confidence conidence margin of error the sample say was 47% but the actual population there's uh the actual population uh percentage is actually somewhere between most likely between uh 45% and 49% sometimes they'll even throw in pulling based on uh sample of 1,317 people uh if you you know find these pollings even these political pollings whatever they may be uh always look for this margin of error and if you can find out you know look at the how big was the sample size you know if it's something like 580 and you're looking at something that's supposed to represent the whole United States well sorry 580 is not going to cut it that's too small quantitative data the number of siblings in 10 different families 31215 so uh qualitative data the makes five different automobiles so yeah this is another uh Toyota Ford Nissan Chevrolet Honda as far as statistics is concerned all these are pretty much equal it is only people uh and their personal preferences that ranks these things things now you I suppose you can do a survey uh like a consumer report survey and rank them but usually the ranking is based on some quality like service or how how far the gas mileage or something along that line if there is a ranking it's going to be based on something is what I'm saying uh quanti ative data can be sorted in mathematical order the number symboling so remember what I said once you apply or convert it to a number now you can sort the data uh when a data set includes many repeated items that can be organized into frequency distribution which lists the distinct value values along with their frequencies frequency is just a simple phrase or a simple word for how often does it occur it is also helpful to show the relative frequency of the stinct item this is the fraction or percentage of the data set represented by each item so we'll see an example of that that uh the 10 students in math class were pulled as to the number of siblings in their individual families construct a frequency distribution a relative frequency distribution for the responses below so here we have the 3221 34 etc etc etc that's the raw data and now we're going to organize that into something more useful so notice when we go back here what are the distinct numbers well there's three there's two there's one and then there is four those are all the distinct Valu once you've determined that now you can ask well so how often does each distinct value occur that's the next question okay well three occurs one two three four times uh the next one two and then so we have one two and three so three times one uh only only occurs once and then the four two times so now I will take all of this tallied information and organize it into such so the number X so one 2 3 4 those are all the distinct values that occurred in the Raw data how often does it occur well the one only occurred once the two occurred three times the three occurred four times and then the uh four only occurred twice now remember we had 10 numbers to begin with so if you look at the sum of these it should match the original sample size so there's a check so 10 so this is the frequency the relative frequency is well what portion of 10 is one what portion of four is 10 what portion of 10 is three okay well what portion of one is 10 well that's going to be 1 over 10 three over 10 four over 10 five and often it's represented as a decimal or a percentage given this information we can now do a histogram so the data from the previous example can be interpreted with the aid of a histogram a series of rectangles whose lengths represents the frequencies are placed next to each other as shown so here are your values your ordered values 1 2 3 4 or you have a width representing one an equal width for representing two an equal width for representing three and another equal width for four and this is going to represent the so the x is the data and frequency for the Y value now we notice that the largest value was four so we're going to tack on a little bit to the top the uh frequency is going from 0 to five since the first one is one its frequency was only one so this bar only goes up to one uh for the three that bar goes up to four which is great because that was the frequency of three siblings so this is a different way of saying the same thing so this is a frequency polygon simply plot a single point at the appropriate height for each frequency uh connect the dots with a series of connected segments and complete the polygon with segments that trail down to the axis here we're trading in the histogram for this uh frequency polygon so remember the one siblings had a frequency of one so that's why this dot is here the two siblings had a frequency of three so that's why this dot is here so where the dot is if you move move it over to the left then wherever it hits the Y AIS that is the frequency uh if you move it down to the x axis okay wherever it hits going straight down uh that is the number of siblings that's being represented so you draw all the dots you connect all the dots first of all you connect all the dots and for the left and right Edge well these these two lines are extra uh frequency polygon is an instance of the more General line graph so yeah uh we we we'll talk about that in a minute so Lon graph frequency polygon is an instance of the more General line graph so it's not a polygon so there's there's nothing to Anchor this uh these line segments to the x axis uh all you do is just uh line or uh put the dot UPS connect the dots and that's it grouped frequency distribution now these sets containing large numbers of items are often arranged into groups or classes all data items are assigned to their appropriate classes and then a group frequency distribution can be set up and a graph displayed so for example uh suppose you're going to measure the heights of all uh of the city of uh Tomo so one way to have group frequency is you have a large number so okay fine now mostos you measure Heights and you know so 60 inches 60 and a half inches 57 in of 69 and a half uh 80 Ines and you know not everyone comes out to be a perfect unit of inches I may a half an inch third of an inch fourth of an inch 714th of an inch and then the uh 63 and 58 7 in what you may do is rather than having a you know frequency for a 60 in and separate one for a 60 and a half inch another one for you know 70 and a quarter inch what you going to do you're gonna are you gonna I mean if you're going to do a quarter inch why not uh half an inch and 3/4 of an inch holy cow if you're going to do 58 of an inch why not a group for 63 and 1/8 63 and 28 which is 1/4 63 and 38 63 and 48 which is one2 uh 63 so are you going to have a separate frequency for each one of those heights yeah that's kind of odd so one thing you could do is consolidate and that's that's all what we're doing here is consolidating 60 in is going to represent anyone that is from 60 in to uh less than 61 in uh 61 in it's going to be everyone who is between 61 inch and less than 62 in etc etc etc that might be problematic still were you doing these uh frequency distribution you you don't want a large number of classes you want because you know people are going to be reading these things and quite frankly when uh the all of these graphs and such those are just for the benefit of people and it has to be understood fairly easily by the people too so the simpler the graph the better uh you know these are going out from 68 to 80 in and so wow so you know so you're going to have uh so if you brok them down into like a quar inch 8 a minutes that's going to be rough 400 classes and that it's about 160 classes uh that's that's way too much uh you can consolidate this like so okay well anything that's 60 to you know up to 60 and 78 of an inch they they're all going in class called 60 but that may not be that's going to be cut it down to like about roughly 20 classes so you can consolidate even more and have your classes to be say Okay 60 to 65 in 65 to 70 in 70 to 75 in 75 to 80 in 80 in let's see uh 80 to 85 in well now we're getting to like a 7 foot people and such so uh and remember we do have people who are shorter than 5 feet so maybe throw in so 55 to 60 in so these are anyone who's at least 55 but less than 60 so this could end up being your new groups notice that all the classes put together cover the whole range of heights that you are covering number two all the classes covers the range range of values of data so any data that you have should fall into one of the these uh classes uh number two is that they don't overlap so no overlap and three equal WID these are often common properties of classes and often you want to limit it the number of classes to probably no more than 8 to 10 anything beyond that uh makes it bit complicated um so guidelines for classes of group frequency make sure your each data will fit into one and only one class try to make all classes the same width make sure that the classes do not overlap these are tied together if they do overlap then you're allowing the possibility of some data being represented to more than one uh uh class I use from five to 12 classes too few or too many can obscure obscure the Tendencies and the data so we're going to construct a histogram 20 students selected randomly uh were asked to estimate the number of hours that they had spent studying the last week in and out of class and the responses are recorded below bang here are all the here's all the raw data and it looks like they have been or nope they're not ordered so this this is your raw data tabulate as a group frequency and a relative frequency distribution and construct a histogram for the given data so solution so notice you know we have this 15 we could do a 15 we could do but uh hey is there a 16 somewhere in here no not really so that's going to look kind of odd you know 15 then SE 16 no 17 no 18 No 19 etc etc so the range so the values range between 15 I believe that is the smallest and it's like 58 that is the largest here are our classes okay so remember the range is from 15 to 58 so notice our classes covers that whole range uh number two they're of equal width now to calculate the width is you're going to look at 10 and 20 so if you the difference between 10 and 20 is 10 okay it's fine difference between 30 and 20 that's 10 okay fine uh difference between 40 and 30 that's 10 difference between 50 and 40 that's 10 so they're all have a width of 10 now if you were to continue with this this would if you had the the start of the next class it would be 60 based on the pattern that we're going at so if we look at the width of 60 to 50 oh that's still 10 so yeah equal width for each class now you might say it's say well wait a minute Mr why not just go 19 minus 10 well that that's fine if if you do differences just within each class then you have a problem because you know this has a width of nine this has a width of nine this has a width of nine this has a width of nine but if you notice holy cow there is a hole here so calculating the width using values within each class is not helpful so you want to look at U like the lower measurement the lowest measurement in the class from each succeeding classes so that's why you're going to look at you're not going to look at 19 minus 10 you're going to look at 20 minus 10 30 minus so you're anchoring the width measurements with amongst neighboring classes so in this bad example they'll have common width but now you have a hole okay where's 20 going to go here you have no holes there's no gaps there are no overlaps every data is going to all into just one in one place there's no overlap and each class has a common width so when you break it down when you look at the grouped data this is what you have so there was only one value that fell into the first class and the next class you had five the next one you had three so there's one one there's five there's three there's five and there's six now just to make sure that things uh work out if I add these together does that give me the 20 that I should have originally have so 69 14 20 yep it does yes so original sample size the numbers 20 uh 10 20 30 40 50 are called Lower Class limits they are the smallest possible data values within their respective classes the numbers 19 29 39 49 59 are called the upper class limits so the class width for the distribution is the difference of any two successive lower or upper class limits in this case a class width is 10 so this is what I said earlier class width you're going to look at the difference of two successive lower class limits or upper class uh stem and leaf display the 10 digit to the left of the vertical line are stems while corresponding one ones digit are leaves the stems and leaf displays conveys the impression that a histogram would without drawing and also preserves the exact data so when you have a uh a group histogram represent as a grouped histogram uh when you make that conversion the raw data into to a a group histogram one thing about the group histogram does is it loses the sense of the original data so for example when we looked at the 10 to 19 that had a height of one the only raw data that fit this range was 15 plug once you create your histogram and you look at this all you know is like oh there's one value in here oh wait a minute that value could be a 10 it could be an 11 it could be 12 13 14 15 16 17 18 19 but that's the most that I know about this uh information so what the stem and leave leave do is they kind of give you the The Best of Both Worlds it gives you the the idea of a histogram but it also uh preserves the original data so when we present the data from the last example in a stem M Leaf display so here's the original data and now we're going to convert it into a stem and leaf so these are your tens so this this five represents 10 and five which is really just 15 this nine represents 20 why 20 because that's two and the 10 and nine which is 29 this six represents 30 and 6 which is 30 six uh this seven represents 50 and 7 which is 57 if you had the value values for 60s and 70s how should I put if I had values for 70s but nothing for the 60s that's fine uh You' put in a six but leave it leaves as blank and then seven and then you know whatever uh values you have so that's the idea of stem and leaves notice that the leaves are all in order uh you can kind of get the idea of a histogram because how many numbers are in the tens well there's only one number the 15 uh there is one two three four five and the 20s three in the 30s five in the 40s and 1 two 3 four 56 in the 50s which is what was conveyed in the histogram earlier but unlike the histogram you can reconstruct the original data like done here the smallest value is clear to be 15 uh the largest value clearly clearly to be 58 bar graphs a frequency distribution of non numerical observations can be represented in the form of a bar graph which is similar to a histogram except that the rectangles are usually not touching one another and sometimes are arranged horizontally rather than vertically so frequency of vow letters AE I oou so I don't know where this comes from maybe they took a random collection of uh of like I don't know 20 letters or not 20 20 words and see what kind of vowels were in those words notice this is not a histogram so a histogram is very or special graph representing uh frequencies of of data this is not a histogram because uh here you're there are gaps in between the rectangles so that's a way of distinguishing between bar graphs and histograms a circle graph so a graphical alternative to the bar graph is circle graph or pie chart which uses a circle to represent all categories and divides the circles into sectors or wedges like pieces of a pi so Pi chart now those wedges their size show the relative magnitude of the categories the angle around the entire circle measures 360° for example a category representing 20% of the whole should correspond to a sector whose angle is 20% of 360° or 72° so for example this now notice that uh this has nothing to do with frequency this is talking about okay General estimate of a Amy's monthly expenses so you take her monthly expense this may not be for a single month this may be averaged out over over a year so every month they catalog all the expenses in terms of clothing rent food and other things so 10% is for clothing 25% for rent 30% for food so if you look at the clothing that's the food is three times the clothing so the the angle of the wedge should be three times that of the clothing anything that is a pie chart can be converted to a bar graph and vice versa so it's a bar graph uh percentage so you would have uh say uh so clothing would look something like this uh rent that's 25% so you know that's halfway between or 20 and 30 food that's 30% so that's goes up to here and then other who knows what other is 35% so that's roughly this so anything that can be presented as a pie chart can be converted to a bar graph and vice versa anything that's a bar graph notice that the as a bar graph we're not dealing with any frequencies uh we're just talking about portions of uh monthly expenses so this is definitely not a histogram histogram is only dealing with frequencies line graph if we are interested demonstrating how a quantity changes say with respect to time we can use a line graph we connect a series of segments that rise and fall with time according to the magnitude of the quantity being Illustrated so for example we have this the line graph below represents uh the stock price of company pcwp over a six-month span so we start off with January price and dollars is uh the is the Y AIS so if we look at January that go goes to this dot that dot goes to here so that's roughly yeah what $3 and5 dollar uh February that goes to here and then that's roughly halfway between so you know five point so5 and a half dollars and so on and so on and so on and so you can look here it's like oh wow January it started at about somewhere between three and $ four doll and then it started to increase in February and March but oh look at there in April it started to go down it went down one more month and then oh look at there it went back up one thing that you can do with uh what is the time frame reflected in the figure so it is simply January through uh June uh one trick you can use is to consider the slope of the line segments so if it is a shallow slope so a little growth if it's a steep slope then much grow if it was a slope like this that's a positive slope which means a positive uh growth it was something like this okay so that's going to be a negative slope which means negative growth so you can kind of get a sense of certain things about the uh about the graph looking at the the the slope of each line segment you know it's like this is roughly 45 degrees so you know average I wouldn't say average but Sun growth uh now it's a little shallower for the next segment so the growth has has grow has hasn't grown as quickly as it did from the last month and then oh looky there it went from positive to negative so now it's negative slope uh the next month it's the slope is roughly the same as the one before so it's you know it's still a constant down and now it goes back to positive slope so hey Positive Growth uh something shallow would be something like this or like this steep you know something like this or like this or anywhere in between okay so that finishes uh 12.1 so we will now jump into 12.2 now if we finish this that's fine that'll be the end of the day more things about statistics measures of central tendency so objectives find interpret means try to interpret uh medians try to interpret modes and fur central tendency measures from stem and leaf displays discern Symmetry and data sets uh compare measures of central tendency so measures of central tendency for a given set of numbers it may be desirable to have a single number to serve as a kind of Representative value around which all the numbers in the set tend to Cluster a kind of middle number or a measure of central tendency three such measures are discussed in this section the mean more properly called the arithmetic mean uh that's because there are other kinds of means of a set of data is found by adding up all the items and then dividing it by the sum of the number of those items uh the mean is what most people associate with the word average the mean of a sample is denoted as xbar while the mean of a complete population is noted as Mu so the mean of a sample versus the mean of a population so the mean of complete population is denoted as Mu the mean of a sample is denoted as xar in the realm of Statistics we want to be able to separate measures of samples from measures of populations measures of uh samples are often noted with Latin letters so X or uh S letter you know letter s uh measures of populations are often noted with Greek letters MU is the Greek equivalent for the English Latin letter M so M for mean and there are other things uh Sigma comes to mind but we we won't get too much into that so when we look at a measure and we oh this is oh X so this is talking about a sample or if we have we say oh mu oh this is talking about a population not a sample oh xar oh that's talking about a sample not a population so we try try to keep measures of samples and measures of populations and their own separate worlds uh the mean the mean of end data items X1 through xn is calculated as follows as you might have guessed oh this oh so this symbol basically means the sum of this is the capital Greek letter for S so we're not talking about population we're talking about a function so this is more arithmetic algebra that kind of Arena and this is to be read as the sum of sum of what oh the sum of the X's so this here is short for X1 + X2 plus X3 plus blah blah blah plus uh xn all divided by n the size of this collection so 10 students in a math class were pulled as to the number of siblings and their individual families and the results were this so this is from an earlier example find the mean number of siblings for the 10 students so solution here's the formula and now we're going to take the sum of X so that's going to be the 3 + 2 + 2 + 1 plus uh 3 + 6 + 3 + 3 + 4 + 2 which gives us the 29 the N is the the 10 so 10 students there's your n so that just becomes 10 29 divid by 10 is 2.9 so the mean number of siblings is 2.9 here's a different kind of mean called a weighted mean so the weighted mean of n numbers so n numbers X1 through xn that are weighted by the respective factors fub1 through FN is calculated as follows so here we have W bar why W because W represents the weight bar because it's a mean so this here is short for X1 * fub1 plus X2 * FS2 plus X3 * F3 plus etc etc etc plus uh xn times FN and as you might have guessed this is nothing more than the short for uh fub1 plus FS2 plus F3 plus etc etc etc up to FN now on the top in the numerator you have those parentheses to clarify that we're talking about uh X time time f for each individual term and then we're going to add them all add up all these products here in words the weighted mean of a group of weighted items is the sum of all the products of the items times the waiting factors divide the sum of all the waiting factors now one common use for this at least in the uh School world is the grade point average and a common system for finding grade point average a an a grade is assigned Four Points three points for a b two points for a c one point for D finding the grade point average is multiplying the number of units for a course and the number assigned to each grade and then adding up these products find finally divide the sum by the total number of units this calculation of grade point average is an example of a weighted mean for example find the grade point average for the grades below so a math class uh this person got a an a for 5.5 unit class uh a B for a three unit class uh an a for a two unit class and a c for a 2unit class so solution here's a new chart we added a new column five * 4 gives you 20 3 * 3 gives you 9 uh 4 * 2 gives you eight and then 2 * 2 gives you four and now we're going to take the sum of the second and third columns uh 53 four 53 and four so that's going to be a 12 thing it's called the sum of so the sum so this is uh shortcut for just saying the sum uh 20 + 9 is 29 + 8 37 + for 41 so when we find the uh grade point average that's going to be 41 over 12 which is 3.42 rounded so mid-range B median okay so another measure of of central tendency which is not so sensitive to extreme values so one fallback about means the arithmetic mean the average is that suppose you have a whole so here's number line and suppose you have a whole bunch of uh data that's within a particular range but for whatever reason you have this guy standing way outside the bounds of everyone else so this guy is called an outlier and if you include the outlier well the mean it's like a magnet so the mean is often drawn uh toward the outliers I mean what if you didn't have just one what if you had the sever values that are out here that you know you may have like a thousand or you may have like a hundred values but you have four that are way out and left field in a manner of speaking and so whatever average if you had the if you took the average of this it's going to be a particular value but if you include all these outliers that's going to skew the mean and it's going to in a manner of speaking draw that mean is going to be drawn towards the outlier so there is a significant difference between a mean without outliers and a mean with outlier so often when doing actual statistical studies if outliers are do come up those are handled in a different manner they may be included they may not be included all it's going to depend on uh how many outliers there are uh how much of a influence it has on what you're trying to measure medians are not as influenced by outliners so another measure of central tendency which is not so sensitive to extreme values is the median this measure divides a group of numbers into two parts with half the numbers below the median and half of it above it so the median is uh the middle if you will to find the meaning of a group of items step one you going to take all those items you're going to rank them from smallest to largest or largest to it doesn't matter the important thing is that they are ranked if the number of items is odd the median is the middle item in the list if the median if the number of items is even the median is the mean of the two middle numbers so back to the 10 siblings um and the the 10 students and the number of siblings to now we're going to find the median number of the siblings of the 10 students solution so this is the raw data and now you have the ordered data now there's 10 students which means it's even so that means it can very easily be split into two two groups of equal sizes and uh so there's no particular number to represent the middle because the middle is going to be somewhere in between the fifth and in the six uh numbers so what we're going to do is we're going to take the middle two and now we're going to take their average which gives us 2.5 so the median uh number of siblings is 2.5 uh we'll finish off here uh position of the median uh the uh size of your sample plus oneide by two or sum of frequencies plus one notice this formula gives the position the position and not the actual value so remember this is only the position and not the actual value and with that we are going to stop so we'll see you for the uh tomorrow