Transcript for:
Statistical Concepts and Data Analysis

okay welcome back everyone so we are done with reading number six and now we move to reading number seven with me now if you look at the index of reading number seven am I am I Audible no yes everyone can you hear me yes sir yeah yes Okay so now we are going to reading number seven and what is the name of the reading statistical Concepts so you have statistical Concepts and then we'll be looking at various other parameters also so in this reading the major focus is going to be on something that is very crucial in today's world and that is data what is data what is the qualitative and quantitative aspect of data how to study the data and how to make references from the data how to make analysis from that data remember the biggest source of analyst is the data from where he or she is going to make his conclusions uh make his projections and so on now how good the data is how to categorize the data how to present the data all this is what we are going to see in this particular topic and in this topic whatever you are going to learn is going to be very crucial for you in understanding portfolio management so when you're doing this topic the bigger version of this topic is nothing but the entire subject of portfolio Management in portfolio management you'll be doing n number of calculations with respect to your portfolio's returns and risk and covariances and correlation and so many things all this in a shorter in an Abridged form is present in this particular topic so that is what uh this particular topic is so I've just prepared the index which is very crucial for me I spend more of my times in the index then the later on because I need to know the uh it's like you know habit of just looking at the Google Map before traveling these days so you should know where are you going to start with where are you going to end what is the traffic and all so that is the way I first look at the index what is this topic all about and it's like seeing the trailer and then I go for watching the entire movie so that's how it is so in index we are going to I've bifurcated in two parts the first three part A B and C are about the qualitative aspect of data if you notice here the qualitative aspect of data which is all about how the data is organized how it is summarized and presented in a tabular or a graphical manner that is what part A B and C is Part D e f and g Retron is about the quantitative aspect of data that is all about the calculation and the two main things about the calculation for anyone is how to calculate the mean and the standard deviation that is the returns and the risk so that is the measures of central tendency and the measures of dispersion this is going to be very crucial when we will be doing uh the calculation later on right so the next part is all about the calculation that we'll be seeing and the previous part is more about the presentation of data and finally there is discussion in if you see in part G there is discussion about what is the distribution of the data what is distribution and all I'll explain it later on and that g part is going to be very very important especially when we'll you will be doing the other topics so the later on topics that we are going to see in that the G part that you see here the characteristics of distribution that is going to be very important so this core topic and other topics are related from the part G you can see so let's take it one by one so the first thing I'm going to focus is on the first a part that is what is data uh just a couple of years back the chairman of Reliance Industries Mukesh Ambani in the AGM he said one very important thing now before that uh before we all know jio took over as the flagship company of Reliance Industries or what is Reliance Industries major Revenue coming from even till date what is it coming from the major Revenue contributor of Reliance industry or crude right oil and remember oil is a major commodity of not only Reliance Industries but many nations many nations wealth is dependent on oil but Mukesh Ambani said in the AGM that data is the new oil it's a new gold so that important people are giving to the data and today in the era that we live we all create some or the other form of data by typing by tweeting by posting a chat or a picture we create a lot of data in this data I studied at times misused also and then based on this data today's companies or today's products are pitched to the customers the problem is that this data is plenty too much of data it is now now this too much of data you cannot study this too much of a data in real life when it comes to financial investments also for example if I want to study now today's talks as I said you have Thompson routers you have uh you know you have Bloomberg and you have SNP Capital line and you have various other platforms through which you can pull out financial statements of any company that you want now you can get the financial statements you can get the ratios you can get all the numbers that you're looking for or you may have to do the calculation here so the data the data is already available in Plan B now analyzing that plenty data in a systematic quick and easier way that gives you a more productive and more you know descriptive answer or more uh you know better analysis I should say rather that is an important part and that is what we are going to learn for that what we require is the help of statistics so what is statistics it is the method that we apply statistics is nothing but the method that you apply to collect and analyze the data to collect and analyze the data and that is called as statistics so statistics is the method that you apply to collect and analyze the data there are two types of Statistics here the descriptive and the statistical inference descriptive statistics is when you are studying a huge data Mass data a Consolidated data and statistical inference is when you're studying a sample of that mass data so descriptive data is a mass data and when you are studying a sample of that mass data that is called statistical inference why are we doing that because you know this Mass data which is too much of information that data when it is converted to information it is too much of a data and it is difficult to analyze everything so therefore we have to make judgments about the large data the descriptive statistics buy some statistical inference that is by studying the sample set so this brings up to the conclusion that there are two things a population and a sample population means the entire data the mass data and Sample is a subset of the population correct either population is the mass data or all the members of the group and if you are going to pick up a sample of the group that is the sample subset of a population clear so in U.S there are more than 10 000 listed stocks right if I want to see how the US market is progressing will I actually go and analyze all the 10 000 companies that's going to be time consuming costly and at times difficult to do right everyone so how do I know how the U.S market is doing good I need to pick up a sample of those 10 000 stocks and analyze so what is the sample of all the stocks that we see and we make a judgment that the market is doing good or bad what is that called as can anyone give me that name index yes index index is a subset of the population so there are 10 000 stocks on U.S exchange and we can say the index of those 10 000 stocks a sample of those 10 000 stock is S P 500 S P 500 is the index the very famous index in us like you have thousands 30 stock index you have S P 500 Index so what is S P 500 it is the sample of the 10 000 stocks which are listed on the 10 000 Plus stocks which are listed on the US exchange everyone with me so in India we have to exchange this NSA and BSE there are thousands of stocks listed on the exchanges right but you don't analyze all the Thousand stocks to see whether the markets are doing good or bad what do you see is that you focus on the index so NS is 50 nifty Nifty is a sample of all the population all the stocks in NSC and sensex is a sample 30 sensex is 30 stocks right sensex has is a sample of all the stocks listed on Bombay Stock Exchange am I correct everyone so we study the population we want to study the population but we studied through the sample why do we do that because studying the population as I said is difficult time consuming costly and at times not even possible for example if I want to study uh all the population of a country like India or China 1.3 1.4 billion people is it possible to study each and every one no what happens during the election everyone votes but what do these news agencies do do they do they do they take the survey of all the people in the nation no they run some exit polls which are of course pathetically wrong many times but they run exit polls how do they do that do they do sampling correct based on their sampling of few people from a particular jurisdiction or a particular constituency they decide the mood of the nation they decide the future they decide the they come up with the prediction of their uh the election results got it so sampling is a very very important concept clear everyone what is the population and Sample any questions here now any number that describes the population any number which describes the population is called as parameter is called as remember these words okay the reason I'm discussing all these words because going ahead I'm not going to use the description I'm going to use the name of those words so any number or any uh measurement like mean and standard division which we will see when you're calculating those mean and standard deviation for a population that is called parameter but when you're calculating the same mean and standard deviation not for the population but for a sample then it's called as sample statistic so sample statistics which means a number for the sample and parameter is the number for the population clear so the quantity which is used to measure population is called parameter and the quantity which is used to measure the sample is called sample statistic everyone finally I have written in the note in Practical life studying the entire population is difficult costly time consuming and at times even not possible in such a case we try to make the conclusions about the population by studying the sample clear everyone so now the next point we go is the measurement of the data so how good what is the quality of the data how uh describe well describe the data is that is what is this part measurement scale of a data so what is measurement scale going to tell me measurement scale is going to tell me that what is the quality of the data what is the including thing in the data so to summarize and analyze the data we use various statistical methods however to choose the appropriate method before we choose which method we should use to analyze the data we actually need to distinguish different measurement scales based on different measurement skills guys will be able to know that what does the data include so we have classified four measurement scales the first is nominal the second is ordinal the third is interval and fourth is ratio and this goes in the same sequence so nominal ordinal intervalent ratio now before I discuss nominal is the is the that type of data which has the least information nominal is that form of data which has the least information therefore it is also called as weakest level of measurement and then when you go a notch above from nominal when you go ordinal the data gets more information from ordinal to internal the data gets more information and when you go to the last scale ratio it is the strongest form of data it has a strongest level of measurement clear so if they ask you what is the weakest data then it is nominal what is the strongest data then it is ratio scale so it will go step by step okay now what is nominal data nominal data is where you have simply bifurcated the data into ranks or sorry into categories for example let's say in a classroom there are hundreds of students I have just separated them into how many are boys and how many are girls nothing else right in the similar way if there are thousands of mutual funds if I just classify them into these are growth mutual funds these are value mutual funds or if I classify them into these are Equity mutual funds or these are debt mutual friend that's it I just classify them right I don't rank them that in those boys or girls who are the top 10 top 20 and so on I just don't rank the mutual funds or the students there I simply bifurcate them or separate them categorize them when only categorization is there that is called the least informative data and that is why it is called as nominal scale everyone with me so what is the keyword in nominal scale what do you have in the nominal scale only and only categorization or classification clear now if you remember my example I just separated the students how many are boys and how many girls right but now in those categories if I also start ranking how many got top 10 how many were in the top 10 in the top 20 and so on if I start giving them ranking if I start placing or sit make them sit in an order based on the numbers so now I add the category plus the ranking that is called ordinal scale so in ordinal scale you are categorized you are going to categorize and also rank there you're going to put them in some order right now coming to the financial example apart from the classroom example is that you have bonds and in that bonds you have categorized the bonds uh but also added the ranking in the bond and the ranking in the bond is called the rating so now let's say you have bifurcated the Bond as fire Bond 10 year bond and 20 year bond but within those categories you have also started giving them credit rating so these are AAA Bond double a bond single a bonded zone so now we have categorized as well as you've given the ranking that is called ordinal scale one scale about the nominal clear everyone right so in the mutual funds also when you have equity and debt and hybrid mutual funds within that you categorize within the equity you have large cap mid cap small cap micro cap and so on correct within the bonds you have longer maturity shorter maturity Bond uh less credit risk Bond High credit respond and so on clear so what do we have in nominal only the categorization in ordinal you have categorization Plus Plus let me know the keyword ranking ranking right but when I say large cap funds I just have said that you know out of the Thousand mutual funds 500 are in equity 500 are in debt within that 500 funds I have said that these are large cap these are small cap these are mid cap and so on that set but I have not said anything that the biggest large Cap Fund and the next large Cap Fund what is the gap between that what constitutes a large cap I have not said that so I need to say right the one which has a market cap of more than 100 crores or more than a billion dollar I should say that is large scale the one between a billion dollar and 500 million is a mid cap and the one below 500 million is a small cap so I now need to put some scale in between some value measurement in between so I can say for example I can say uh all the bonds that I have put the category AAA double a single and so on all the bonds who have uh you know are between this valuation or between uh this amount of bond issued they they are they fall into category a b c and so on so if in nominal it was only categorization if in ordinal it was ranking in interval scale I also need to put the numbers between the ranks so I should know that someone who's in rank 1 or in category a rank one what is the score of that person and what is the difference between the score of rank 1 and rank two I need to know the score I cannot simply say okay these are shortlisted I need to have the scale difference between triangle and Rank 2 Rank 3 and so on so what is interval scale it is where the data is not only ranked but the difference between the scale is measurable right therefore scale values can be added or subtracted meaningfully so if I say if I say that if I add two percentage to uh you know above four above 4 and below 4. so I can go to either two plus uh four that is I can go to 6 or I can go to again so from 4 if I add 2 If I subtract 2 I know the scale so I need to know some measurement which I can add or subtract and accordingly I can be placed in different categories so here they have given an example of temperature for example the difference between the temperature of two to four percent is of two percent and the difference of temperature between 8 percentage to ten percent is also two percentage right however there is a problem and the problem in interval scale is that it does not give us true zero point there is no zero point which means that 0 in this scale does not signify anything for example in internal scale of 0 means actually nothing but does the temperature of 0 means actually nothing when I say the temperature is zero does it mean that there is no temperature does it mean that there is no temperature no there is there is a temperature right so then the problem with interval scale is that it does not give me a true zero point it does not give me a zero point basically it tells me that 0 has no meaning but in real life I know 0 has some meaning for example when I say the money that you have is zero that means you actually don't have money you are you are penniless you you don't have any single uh thing to buy or sell when I say the returns from your investment are zero that means you have lost everything right your portfolio has become zero when I say your portfolio has become zero that means you have actually lost everything it does mean something it does not mean nothing are you getting it so ratio scale is the one which has all the characteristics of nominal ordinal interval but it also has a true zero point are you getting me so what is the ratio scale a ratio scale is that type of data which you and me as an analyst can use to analyze stocks you and me we all can use to predict the stocks because these are the most informative data for example when I say returns returns of positive does mean that you've made something returns of negative does mean that you've lost something right and return of 0 also means there is no return so everything has a mean similarly money money is said to be the best ratio scale because having money and not having money or zero money that also makes some because when I say there is zero money that means I have nothing in my bank account so when there is a value for 0 also when there is a meaning for zero also that is where we say ratios the most qualitative data is called the ratio scale so one word that you have to remember nominal scale comes first it is a v cash form of data but what is the keyword in nominal it only shows you categorization categorization or classification the second comes ordinal one step ahead which will show categorization Plus rank rank then I go a step further interval scale in the interval scale guys I will get I will get categorization Rank and between the ranks I will get the value that what is rank what is the you know criteria for rank a rank B and so on so interval scale also so name itself intervals which is the right and scale there is a number between the two ranks got it but the problem with interval scale is that it may not have a true definition for zero but when zero also means something then that is called the strongest form of data that is ratio scale everyone clear so quickly repeat after me what are the four or you can repeat with me the four types of data in sequence nominal ordinance [Music] you have to remember in sequence because nominal has is the weakest form then comes ordinal then interval and then finally ratio right so can you explain 0.108 okay actually the difference between interval scale and ratio scale is only the difference of uh the meaning of zero in interval scale 0 means nothing but in real life 0 also has some reference so for example when I say uh the return is zero percentage that does mean that I have not gained anything when I say the money is zero of course I mean that I have no money but in interval scale 0 when I say 0 it it means that there is nothing in it there is no temperature so for the the best example that we all have is about temperature so if I'm just seeing a scale just based on numbers the temperature of 0 actually means nothing in the interval scale but in real life I know that it does have some mean in fact in fact the heat that you feel between two to four percent is not the same heat that you may feel between 10 to 12 percent right uh 10 to 12 degree temperature so it's it just gives you a mathematical answer but it has got no uh proper uh differentiation or it has got no proper description so it's like you know in Hindi we said so interval scale just gives you the number but ratio scale will go a step ahead of that and it will give you the reference of that number also so it will tell you that okay 0 means it is cold 10 to 12 degree temperature the difference is 2 degrees but that is not the same as what you have for four to five degree or four to six degree temperature or two to four degree temperature so there is more description in those numbers and that is what is ratio scale so ratio square is the most qualitative so nominal just gives you category ordin will just give you the ranking interval gives you tells you the difference between rank one and trying to that's it but it will not tell you further beyond that so if you need more description about that data more description about the skills then that is what ratio skill will do so last again to conclude if interval scale will tell you that the first hundred uh companies are in large cap the second hundred are in mid cap that's it that will internment will do but what is uh the large cap basically uh you know what categorizes a large scale why is it called large cap more information you may require beyond the numbers then you have to go for the ratio scale of course in exam it's not going to be a situation where you may have to identify that what's the difference between interval and ratio you don't have to write the description they will give you what you need to remember is that this example that I've said zero point that is what they frequently ask so when there is the value there is when there is a meaning for 0 also that is called true zero point that is ratio but if not then that is enter okay so this is what all about the data was and how do you qualify the data how do you categorize the data the more the information the better the skill now the thing is that let's say I'm taking a data of the entire population during this kovid governments had to take the data off how many people are infected and all right and how many are to be vaccinated and all now if I were to analyze the same way the data of the Financial Security like a stock and if I want to see what are the returns or what are the share prices that have been in the last 10 years or 20 years there will be so many price and so much of a data right if I put everything in on an Excel sheet and if I start analyzing each and every number that will not make sense it will be so much of a time consuming and a very daunting task so what I need to do is that I need to properly present the data in a tabular or a graphical manner and when I present the data in a tabular manner that is called frequency distribution so in frequency distribution guys what I'm going to do is that I am going to do a tabular presentation of data I'm actually going to organize the data in a systematic manner that is frequency distribution so it is a tabular display of the data and helps in analyzing large data set it is a summarized version where you are going to put the data into groups so what we are going to do is that we are going to now calculate the frequency distribution so what is frequency distribution guys I am going to summarize the huge data into a tabular presentation into different scales into different you know blocks that is what frequency distribution is now I have uh put the steps here for frequency distribution a very detailed uh you know steps I have described here we'll go to the steps you don't have to remember the steps don't worry we'll go to the steps later on and we'll go to the important points regarding to the step later on before that I will directly go to the question and tell you the steps I'll connect you with the steps okay so I've put up an example of the data and then with that example itself I'll connect you with the steps correct because after that it will be much much easier so what do you see in an example you are given sort the following returns data in ascending order and create five intervals intervals which means what do you mean by interval interval what is interval interval okay measure the category the range the scale right yes so they have given me some data can we just say how many numbers we have one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen so there are 16 returns data can I say this can be the 16 yearly Returns the past yearly returns so let's say in the first year you must have got two percentage second year seven percentage and so on so just assume these are just for a reference you you can assume that these are the returns data for a stock for the last 16 years because the number of data that you have here is 16 right it can be 16 years a month or anything what the first thing that you need to do is that when you get a raw data the raw data is very raw and therefore you need to organize the data and the first step is arrange it how are you going to arrange you can arrange it in an order it can be an ascending or a descending order what is the first step guys the first step I'm going to do is that arrange the data and I've arranged it in an order are you with me everyone here so however arrange here guys I have arranged here in ascending order so you would see that of all these the lowest number was minus 10 so I've taken minus 10 first and the highest number that you see is plus 10 so I've taken plus 10 last and I've put all the numbers in that manner okay I've arranged the data in an order everyone with me and that is what my first step is so if you see the first step what is the first step guys here it is sort the data in an order mostly you can order in ascending or descending the second thing that you see is that the width of the data how much bigger is the data so you calculate the range of the data how to calculate the range anyone how do you calculate the range the range is nothing but the difference between the largest and the smallest right yes your the largest number was plus 10 and the smallest number was minus 10 so my range is going to be in this case I can set the difference between the largest number that is 10 and the smallest number that is minus 10 correct so can I say the range is 20 right so 20 is the Ranger everyone registry Avengers so n minus I can write minus 10. is the range of my data clear everyone so I have 16 numbers I have I know the range the maximum range is minus 10 to plus 10 so that are 20 the ranges of 20. now I need to put them into category so I have the extreme numbers you know from last point of minus 10 to plus 10 and I have total 16 numbers and they have said me that you have to create five intervals five intervals which means I have to create five I have to create five intervals five uh you know categories so what is the total range here guys 20 from the lowest to the top and I have to place them into five categories so can I say every category will have four a gap of four yes that is called the interval width interval width is the is the range of the interval from where it will begin till where so the lowest number I have is minus 10 and the width has to be 4. so from minus 10 till minus 6 that will be first interval correct the second one will be minus 6 to minus 2 you know the gap of 4 the third will be minus 2 to plus 2 the fourth will be 2 till 6 and the last will be 6 till 10. so I arrange the data I calculated the range and based on the number of intervals I got the width of the interval so if you see here the third step was deciding on the number of intervals in CFA instead question they will give you the number of intervals you don't have to choose it yourself they will give you the number of intervals based on the number of intervals if we decide on the width of the interval and the width is how do you get the width the total range of the number divided by the intervals that they ask you on that you will get the width width is the gap between the intervals everyone clear with the first four steps yes right now Dao starts the job what is the job guys now you have to plot everything every number into into what category it falls and that is where is going to come an important technique so focus on the numbers that you arranged in order minus 10 where will minus 10 be plotted minus 10 will be plotted guys in the first category the first one so I will plot minus 10 here correct sorry in my case it is yeah what about the second number after minus 10 the second number that you so I'll plot minus 10 years the second number that you have is minus six now comes an important thing where should I plot minus six either it should plot in first or in the second category second and second and why is that so for that if you focus here I have taken it as less than minus six this is the very important thing when you are calculating the interval you have lower limit and the upper limit so here we had the lower limit as minus 10 but in the lower limit if you notice I have put the sign less than equal to so it will include minus 10 also correct because I have put less than equal to sine right but in the upper limit what do you notice guys in the upper limit I have put only less than so it will include anything which is less than 6 but not six are you with me on this everyone yes sir so minus 6 will fall in the next category everyone then you have minus 4 minus 4 will also be in that category how about minus 3 in that category correct minus 2 in that scale or in the next scale next scale next scale so minus 2 will be in the next scale then you have zero then you have one twice so one will be also in this scale correct then you have two two will fall in that scale of the next scale the next scale so you have two twice then you have three then you have five should I put six in this scale or the next scale next so 6 7 9 and now comes very important thing if you see the last one here the last one here I have not put only less than I have put less than an equal to both why have I done that because if I don't do that it will go to the next scale and they have told me that you can only have five uh how many intervals you can have only five so I need that last number also to be in that category because after that I do not have an interval scale therefore all the previous interval you will see that the upper limit was less than the upper limit was less than but in the last one the upper limit is less than and equal to both everyone with me so it is therefore that 10 will fall in this category if there was six category also if there was the sixth interval scale then 10 would have gone to the next scale but the last scale always should have less than equal to not only in the lower limit but also in the upper limit everyone with me on this why if I don't do that then then 10 will not fall into this 5 scale got it everyone clear anyone any questions here what did we do we were calculating absolute frequencies we were we were trying to find absolute numbers in each interval so there were one frequency in the first interval there were three so I will write here there you had one frequency in this one here you had three frequency and then you had your four and here also you had four and in the last one also you had four coincidence everyone with me these are called absolute frequency but can someone tell me what happened in the interval very important thing two things happen in the first internal and the last interval what happened yes can anyone tell me what happened in the first and in the last in the first interval the lower limit and the last interval the lower limit will always be less than equal but the upper limit in the first interval is less than and all the intervalent is less than only in the last one in the last category the upper limit will also have less than equal to otherwise otherwise your numbers will not be within the within the interval everyone correct so whenever there is a name for it ncf Institute will ask you theoretical questions on sonnet okay not only the frequency calculation all that you have done in your graduation CFA Institute is going to ask you more on the analytical part that's why I'm you know I'm putting more stress on this so the lower limit the lower limit includes the interval so when I say lower limit is minus 10 so it actually includes the minus 10 also right so whenever the the limit includes that particular number also it is called weak inequality so whenever you have this part where it is less than equal to sign it is called what is it called guys weak inequality weak inequality remember this name okay this name is crucial here so week in equality is when is when that particular number is also included in that interval that is less than equal to but when it is not included it is called strong inequality that means it will only that number less than that number will only be included so whenever you have only less than sign it is called strong inequality everyone clear so I will write here I will highlight your weak inequality Whenever there is less than equal to sign Whenever there is only less than sign it is called strong equality so that is not included so we can inequality means included uh strong integrality means not included everyone with him so can I say the upper limit in the last one in the last category in the last interval the upper limit is weak inequality or strong inequality weak yes it is weak inequality because it has less than equal to sign but upper limit in all the categories above the last it is called as why am I doing this you know because CF Institute had asked questions on this they're smart enough they know that your focus will be more on calculation here right your focus will not be here so they know that is where the student focuses so that's why they will put more stress on the things between them clear so weak inequality means it includes that means it does not include that clear now absolute frequency so we calculated the absolute frequency and now we are going to calculate the cumulative frequency and put everything into the table those are the next steps so count the frequencies put the frequencies in the categories and then present it in a table those are the next steps so we calculated the absolute frequency how about the cumula and we know the total of these absolute frequencies should be 16 because there were 16 years data correct so how do I now calculate the cumulative frequency guys cumulative what is cumulative frequency it will include include everything till that point so can I say in this case the cumulative for the first will be 1 what will be for the second one four four for the third one it will be four plus four of that year it will be 6 8 correct the fourth one will be eight plus four so I'll have 12 and the last one of course has to be 16 because there are 16 so it is going to be 12 plus 4. clear everyone cumulative means included till that point so if they say me what is a cumulative frequency for minus 2 2 plus 2 it includes not only that but also the previous one correct and remember the total of all the absolute frequencies should be equal to the last category of cumulative frequency it should always be 16 year everyone with me on this these are absolute frequencies now what is relative frequency relative frequencies when you convert those into percentage so out of 16 how much data you have in the first only one right so how do you convert this into percentage due to 1 divided by 16 so you get 6.25 percentage are you with me everyone how about the second one what should I do 4 divided by 16 no no not cumulative take the absolute one three divided three divided by 16. that is 18.75 but when you go to cumulative 18.75 plus 6.25 that will become 25. clear in the relative one so basically if you notice here what has happened is that the absolute frequency that you have is in numbers and relative frequency is what you have in percentage and the cumulative absolute and cumulative relative are connected got it [Music] so now in this one the third will be 4 divided by 16. correct so that is also 25 so I already had 25 so now this will become 50. then here also it is 4 divided by 16 here it will be 75 percentage and the last one will be also 4 divided by 16. it's a coincidence okay not always going to be the same so this has to be 100 percentage clear everyone anyone has any doubts here everyone clear with absolute frequency cumulative relative frequency cumulative absolute is in number relative is in percentage got it any questions anyone till this point everyone we cleared with the interval scale importance weak inequality and strong inequality understood the concept what yeah what what does this do when now right in front of me in the tabular form everything right now is in front of a in front of me is in a tabular form everyone with me so in this way just imagine if there is too much of a data all that can be categorized all that can be you know in a shorter way can be presented in a tabular way so now when you go with these numbers in front of your management they will they will find it more interesting they will find it more uh easier to see now guys if I want to present this in a tabular from a tabular to a graphical way just note what I do here sorry okay so here I am putting frequencies and here I will put the intervals okay remember the first interval we had was minus 10 2 minus 6 correct and in that the frequency that we had was 1. so I'll put here as 1 and I'm making a bar chart I'm making an internet and I'm presenting trying to present in a chart format what was the second frequency guys we had the second frequency as three so just randomly if I were to do this this has three just bear with me and my excellent handwriting so this is for the second category so the second category was minus 6 to minus 2 so I will write here minus 6 2 minus 2 then I had the third category which was minus 2 2 plus 2 then I had the fourth which was from plus 2 2 plus 6 and then I had plus 6 2 plus 10. now I know that in all the rest of the category the frequency was 4 so if I do 4 somehow in this manner I will get a bar chart correct this is a bar chart where on the y-axis you have the frequencies and on the x-axis you have the scale interval scale everyone with me this is called histogram what is this bar chart called as of all the frequencies this is called as program this is the graphical presentation of it okay and now if I put up a trend line a trend line how do I put a line on this I will just select the midpoints of your the midpoints of the scale and then if I put a trend line on it it will tell me where it is going in which direction it is going this is called gone polygon frequency polygon I'll come to that what is it called frequency polygon I'll describe that but when you see the chart the bar chart it is called histogram and the trend line that you see is called frequency polygon everyone now that is where you take all the tabular data and if there are too many tables let's say if there are you know hundreds of tables and all and how are you going to then see from each and every table you want to move simplified much more then you do a graphical presentation like this one this is not of the previous example that we did this is just a random exam everyone so what is this called as uh the one on the y-axis what are these these must be the frequency frequency is correct the one you see 0 to and all and right in the bottom that you see is the interval intervals or categories clear everyone so what do you notice here let's assume these are the marks of the students right now in my batch by God's grace inshallah you should all get these marks so these are the marks of all my students in the badge where do you see maximum students have scored in control of 70 percent 70 percent right so so how do I come to know from the height of the of the bar chart right so can I say the one with the highest frequency will have the biggest Burj Khalifa there correct so the bar chart represents the frequency and the one which has highest frequency will be the will be the one with the biggest uh you know uh it will be visible there correct so from this I can directly say as so instead of going through the raw data which you organized and then converted them into so now now collect everything you had a raw data you can you categorize the data into scale nominal ordinal interval and ratio after qualifying calling uh making the data into different scales you pull that data and there was too much of data you converted that data into a tabular presentation that is frequency distribution but after frequency distribution if you just want to see it in a picture format then you see a graphical presentation that is called histogram so what is histogram here it is a bar chart of the data that have been grouped in the frequency distribution so whatever you got in the frequency distribution that is what here it is the histogram the height of the bar chart represents the absolute frequency not the cumulative correct guys these are the questions that the Institute may ask okay what is the height of the bar chart called as absolute or cumulative you will say absolute frequency right everyone and and if I if I connect the midpoints of the intervals of every frequency see this if I connect the midpoints of the intervals of all the frequencies and I form a trend line that trend line is called frequency polygon so frequency polygon will tell you that in which direction the flow has been it has been increasing initially and then going down correct everyone clear what is histogram one word that should come into your mind with histogram is bar chart one word that should come to your mind with frequency polygon is trend line everyone with me on this now let me show you the same thing I've done in an Excel sheet also you can also try that uh let me know if the Excel screen is visible I think it will be visible in a while yeah is my screen visible is the Excel sheet visible guys yes yeah so uh when you go in Excel it is not done in the easy manner I am just showing this as an extra part uh this is not something for your syllabus but if you see I just pulled out all those numbers and there are too much of calculation you have to find the maximum the range the interval and also uh pulling this in an Excel is an altogether a different task here when you go in Excel when you're going Excel here you have something called bins bins are the bins are the scale bins basically are the boxes here so the interval scale that you see they are the bins so I have put the lower and the upper limit and based on that I have put the all the interval category here and in that interval category I've calculate the absolute frequency and then the cumulative frequency and so on you can do the relative also the chart will be the same so what you see here on the right on the left which is yeah on your left on the screen is the absolute frequency which if you notice I had done manually also in the same manner that here it was one then three four four and four and then the trend line I've put that is frequency polygon correct but if you notice the cumulative frequency will always keep on increasing why because the first one will be one then in the second when you add the first also so cumulative frequency is like Steps it will always keep on increasing got it CFA Institute won't ask you anything about cumulative distribution but I've just put it here absolute and cumulative so cumulative will always be increasing remember that why because it includes the previous also clear yes everyone okay I'll move to our notes now let's go back to the notes so this is the histogram and frequency particle now if you check here guys two things we want to do about in this topic is that qualitative aspect and the quantitative aspect in the quality to aspect we first discussed about data why data is important in that statistics we discussed nature of Statistics as descriptive and statistical inference basically you are studying the population and the sample and then you did the measurement scale quickly the measurement scale was nominal parameter ordinal interval ratio and ratio right ratio is the best nominal is the weakest then you took all this raw data and put it in a proper format that is about organizing the data which is frequency distribution in frequency you add absolute and you had relative frequencies important thing in the frequency distribution was was guys the interval how do you get this first step is arrange the data then see the range and based on the interval that they have given you you will get the width of the interval based on the number of intervals that they've given you you will get the width of the interval for example here my range was 20 had they told me 10 intervals so the width of every interval would have been 2 so minus 10 to minus 8 minus 8 to minus 6 and so on so the interval didn't specify the number of intervals the width of the interval you'll get from the range divided by the number of entries once you put the interval all the intervals the upper limit will be less than equal to right less than equal to is also called as guys less than equal to is weak inequality but the last upper limit should be only uh sorry all my mistake all the upper limit will be less than and the less than is called as yes the less than is called as strong inequality but only the last interval will be less than equal and the less than equal to is called weakening why is it called weak because it includes that number also strong means it does not include it only is less than so that is the one theoretical aspect that the Institute may ask otherwise the calculation part is very easy you just see the numbers and put them into the frequency you do the cumulative and then if you want to convert that into percentage it is relative frequency you can also do the relative humidity based on the same tabular presentation you can calculate a bar chart and a trend line which is called histogram and frequency polygon so that's a graphical presentation of the data clear everyone any doubts any question till this point foreign any questions till this point okay this was the qualitative aspect of data Now we move to the quantitative aspect of data and that quantitative as quantitative aspect of data is something that you are going to do throughout your life from now throughout my life from yes you are going to be in the investment industry where you are you all are always going to calculate for your clients returns and risks returns and risks and that return and risk is now coming right into the picture and that is called the measurement of the data which is the first thing that we are going to do is the measures of central tendency central tendency so there is too much of data and right in the center what you have right in the center what you have is called the average and statistically what do we call averages average is also called as median mean mean of course you have median and mode we will discuss the difference between mean median and mode also but statistically the average for example the average return of last 10 years when we say it is called as mean now we are going to understand what is central tendency which is what is average that is nothing but mean but what are different types of mean which mean is calculated at what time that's the difference between you know CF and other curriculums in other curriculums you are calculating what you were told in CFA you are of course going to calculate but you are also going to understand why are you doing so why is emphasized more in CFA rather than how so when I do CFA trainings I have seen students focusing more in the calculation but CFA Institute is smart enough to test more of not your mechanical skills but your analytical skills so why are we studying measures of central tendency when we are studying it to understand what is average return and how much this return can deviate that is risk which will study in the measures of dispersion so measures of central tendency that is for returns measures of dispersion will be for risk and then we will study a ratio which will take risk and return more right now step by step right now we are into measures of central tendency where is it used it is used for quantitative it is a continuity measurement it helps us categorize the data or sorry characterize the data it tells us where the average is stored it is very much used in the investment industry guys and it is very easy to complete however under this part of central tendency we are going to see different types of me the one I will focus is population versus sample mean right then you have arithmetic versus geometric mean I'm sure you must have probably heard this and then you have weighted mean which is nothing but the weighted average when will I use weighted average weighted average when I will have weights for the Securities and returns so weighted average is nothing but portfolio returns and the new thing probably you may have not heard before is harmonic mean okay so there are different types of means that we are going to see first of all guys tell me what is population and Sample difference difference between population and Sample amount of data yes population includes everything and Sample includes just a smaller subpart of the population right it's a subset of the population right okay so arithmetic mean is the or the mean that we actually know what is the arithmetic mean let's say you have five years written so what you do you take the total of all the returns divided by number of n the number of years that is your simple weighted average that is your simple average returns so what we all do average return how do we calculate we take the total of all the numbers and divided by the number of years or the number of data that is the simple average return that we do that is statistically called as arithmetic mean everyone so arithmetic mean is the simple average of the data it can be calculated for the population as well as for the sample the only difference between the population and the sample is guys is going to be of the size population will be a larger size sample will be a smaller size clear so if I am calculating arithmetic mean for the population it is called population mean if I am doing that for a sample it is called sample mean correct so here I have given an example example arithmetic means of earnings of 2020 right of all the listed companies on lse of all the listed companies on London Stock Exchange that is a population mean but now in the sample if I am trying to find the top 20 most valuable companies globally 20 from the global that's I am focusing on the sample right most valuable oil companies may be Exon Saudi aramco shell and so on from the so many welcome right so that's a sample correct and re remember the numbers that you get for a population is called as parameter the number that you get for a sample is called sample statistics remember this okay so whenever you're calculating population mean that is also called that you are calculating a parameter and whenever you're calculating a sample mean that is also called as sample statistic so what to calculate for a sample is called sample statistics what you calculate for a population is called as a parameter clear how do we calculate that simple guys I told you how do I calculate that I do the summation of all that is total of all divided by the number of period everyone but the population mean is referred as Mu mu you see this mu whenever you see this sign mu It's a combination of M and U that is why it is called mu mu means which means operation how is a sample mean denoted as sample mean is denoted as X bar everyone with me so mu means population mean and X bar means a sample mean everyone clear with this correct correct directory yes sir so that is the simplest thing that is what we all know is arithmetically now how to calculate geometric mean the other mean I will show but before that if I take this example here guys in this example two way we'll do the example half because we have to do arithmetic and geometric so first we will learn what is energy and then we'll do geometry so here I have given you four returns written in your one two three and four how are you going to calculate arithmetic mean guys simply in your calculator you will add all four and divide it by all the four returns and divide it by 4 right simple can you calculate that quickly 2.8 plus 5.6 in your calculator minus 3.9 Plus 1. and then you press equal to you'll get something you'll get 5.5 and when you divide this 5.5 divided by the number of layers you had four what is the return that you have please tell me guys 1.37 1.37 uh okay can you just tell me in 1.37 I'll take it till 3 decimal because there will be minor difference so it was 5.14 1.3750 1.3750 okay so correct so 1.375 is my average return I will write percentage that is simple arithmetic mean what did I do here I did the summation of all the returns divided by the number of years clear everyone however geometric returns geometric mean is nothing but the compounded returns compounded written for example how do you calculate the compounded return of five percent for two years you do it in a format of factor which is where you see five percentage on one dollar is 1.05 100 is 105 right so five percent one is called 1.05 that's a factor for one year if the next year also it is 1.05 so you do multiply by 1.05 multiplied by 1.05 which is nothing but 1.05 square if the returns are same if they are not same you do it one by one so here if the first year return is 2.8 percentage geometrically how am I going to write 2.8 I'm going to write guys 1 plus 2.8 percent which is 1.028 everyone with me 2.8 on 1 is 1.028 correct how do I write 5.6 can someone tell me 5.6 percent return how do I convert that into a factor 140.056 1.056 so you are going to multiply here now you are not adding because you are doing Factor you're compounding so I will multiply this by 1.056 clear right now don't do the calculation just focus and tell me don't do the calculation now just tell me the numbers the third one is interesting the third one is not 3.9 percentage plus it is minus so from 1 you are going to subtract 3.9 percentage so what you will do one minus 0.039 1 minus 0.039 tell me the answer yes what is the answer 1 minus point zero three nine zero point is zero point nine six nine six nine six one right yes 961061 correct everyone the last one is one percentage plus so if I add one percentage to 1 that is going to be 1.01 clear everyone now in your calculator we can start 1.028 multiplied by 1.056 multiplied by 0.961 multiplied by 1.01 and press equal to and tell me what is it 1.054 1.0537 guys everyone please take your calculator till at least four decimals if you have not done that press second dot second Dot and four enter I want entire class to give me consistent answers so don't conveniently do two decimal and three decimal let's all have a consistency at least four decimals should be in your calculator you can yes you can send me two or three decimal that's fine but in your calculator it should be at least four decimal so everyone is getting 1.0537 something like this but guys this is the total Fourier return I need the average how do I do the average in arithmetic mean how did you do the average difference in arithmetic mean how did you do the total you added everything right here we are multiplying everything so how I do the average you know the average is done in the same manner as you did the compounding in the compounding you multiply raised to 4 you do right if you want to do 4 but if you want to do the average you have to do one by four so now I actually have to do raised to four here it is right so I have to do raised to 1 over 4. now how will you do that in your calculator first of all right now in your calculator let me show you in your calculator and you have to follow me okay here I will go to the calculator function here in your calculator it is 1.045 correct for now 1.0537 guys press y x y x follow me carefully okay y x and just press four don't press anything just press 4. don't press equal to okay 1.0537 Y X and 4. on your screen it is 4 right on your screen have you got four everyone yes do I need four no I need 1 by 4 right I don't need four I need 1 by 4 if you remember correct how do I do inverse guys there is this function called 1 over x 1 over X does the inverse of anything so press 1 over X now what do you get on your screen please tell me 0.25 yes I'll repeat you had 1.0537 raised to one by four you wanted to do right so first you press 4 and then inverse it by so y by x 4 but don't press equal to if you press equal to it will compound four I need one by four so before pressing equal to I inverted that by 1 over X now my screen is 0.25 so now the function will be 1.0537 raised to 0.25 which is what I want now press equal to now press equal to and tell me what do you get on your screen yes everyone is getting this am I correct are you getting in 1.013 to something right just do minus 1 from it and whatever number you get multiply that with hundred so your answer will now be can you tell me your answer lies in four decimal 1.3 anyone okay 1.315 right if you have not got this answer please tell me one five four yeah foreign so what did I do here guys in geometric mean if you notice I calculated return for one year compounded that with second third fourth and then took the average of all isn't it so can I write this was 1 plus one year return multiplied with 1 plus year 2 return multiplied with one plus year three return multiplied with 1 plus year four and so on and then whatever the number I got what did I do with that all this I did raised to not four one by four because I am taking average of everything correct so I need 1 over 4 then I got the answer but since everything is in Factor what you need to do is that you need to subtract one from it and then whatever the answer final answer you get multiply that with hundred everyone so one plus rate of one year rate of second year rate of third and rate of four and then everything in the bracket raised to one by four this is what we did correct see this is how it is one plus one year second year third year until the nth year maybe fourth year raised to one by four minus one clear everyone anyone has any doubt here what is your observation here about arithmetic and geometric mean can you tell me what did you notice here yes if you notice the geometric it's compounded compounded and arithmetic it is simple average simple average if you notice the final answer guys what you notice here is the arithmetic mean is greater than geometric mean isn't it yes and that is a very very important point that the Institute may be interested in it's not interested in your calculation it knows that you will be able to do the calculation because that even robots can do what it is interested in your observation that am will always be greater than GM am will always be greater than GM that is going to be the conclusion so arithmetic meaning is always greater than geometric mean it can be equal Can Am be equal to GM yes if the returns of all the four years were same if there were no difference in the returns can I say arithmetic and geometry will always be give me the same answer got it yes so when all the observations are same that is there is no variability am and GM will be same but but but but at no point am can be less than zero clear so am will be I can say am will be always greater than or equal to GM but am can never be less than GM clear so what is more reliable am or GM well that for that we'll have to wait and see what is actually the more uncertain the cash flows the more real answer am will give more uncertain the cash flow is the more weird answer the am will give and therefore instead of arithmetic mean geometric mean makes more sense when the when the returns uncertainty or variability in the returns is too much so this point you will not understand now unless and I do the example later on but just for your reference very important that you need to understand that uncertainty in the cash flow uncertainty in the cash flows are return causes the am to be larger than GM so more the uncertainty can I say more will be the difference between am and GM clear everyone correct so so guys hold on how did I calculate am simple returns divided by m but geometric mean compounded returns divide by 1 over raised to 1 over n when to use am and GM that is important when to use am and GM is important so guys see this am it is simple you're on your return and then you divide by n right am is to be used when you are focusing on year on year returns your on your point to point okay so let's say if you are idolizing what is the return average yearly return of stock for last five years what is my example you are trying to analyze average yearly returns of the stock for last five years so can I say the returns are yearly you're on here you're on your you're on here in that case you will look for am because you are one return your two year three and you're seeing the average that what will is the yearly return clear everyone what is your objective there to invest for let's say if I give you an example then I want to see the average yearly return for last five years average yearly written for last five years what do you think the client is interested investing in one year or five year long term no I need the answer one or five when I am seeing average yearly return for last five years one year correct you are seeing what will be the yearly return that's why you are seeing one year return but you are not sure what will be the one year return so you're checking the last five year what happened for one year correct so guys whenever you are trying to find average return over one period Horizon one period Horizon for let's say five or ten years so it can be yearly return for the last five years in that case we use am but but guys if I am looking to invest for the next five years so I am probably willing to see what happened for someone who was holding for five years and when I say someone was holding for five years can I say he also was reinvesting in between or not yes and whenever reinvesting comes what type of return do you see simple or compounding company geometry means so if I'm looking for average return over five year period over five year period that means I'm assuming someone was holding the investment for five years in that case I will go for geometric mean but if I'm looking for average yearly returns for what happened in the last five years you're on here then I will go for arithmetic fortunately for you in CFA exams you may not be coming up in a situation where you have to decide am or GM the CF Institute will give you whether you have to calculate am or GM got it but just for your knowledge in real life when do I calculate so if a client comes to me and tells me that Kunal I want to invest for one year what is the stocks average return for one year in the last five years then I will check your own your returns for last five years and I will calculate the average that is what I will do am but if the client tells me that Kunal had I invested five years back what would be my return today that is I will see from last I mean from five years back till today compounding how much he must have made in that case I will go for GM clear everyone yes now I concluded something which has more problem am or GM when there is too much of variability in the amount which one will give me a problematic answer am am right so let's discuss that first here there's a simple example that I've given uh can someone read it for me I would want to try this okay I'll read it if an investment doubles in value guys when an investment doubles in value how much return do you make 100 100 right if an investment doubles in value in the first year and then the value Falls to half of the first tail value so your value is going down negative returns how much by half so I can say in the second year when your value becomes half in the second year you make negative 50 correct right right yes so let's assume that you started at ten dollars let's assume that you invested in a stock at ten dollars what happened to the stock at the end of first year it turned into 20 20. and then what happened in the second year from twenty to five see the value Falls to half of the first year end well so what is the value again 10 isn't it without calculating can you tell me what is the return for someone who started at this point and ended at this point zero zero everyone agrees with yes but see the problem because the return deviation is too much from 100 plus to minus 50 what will am do if I calculate am it will be 100 guys 100 minus 50 right so 100 minus 50 divided by 2 am will show me that the return is 25 percentage average written is 25 percentage isn't that wrong because here if you see the first year return was 100 percentage and the second year return was negative 50 so am is showing me a very wrong picture that my average written is 25 however in reality I know that I have made nothing isn't it everyone you know these are all aren't you not surprised by what you were probably doing till now in the other education and what is happening in CFA it's that self-realization that how wrong we were or how less informed we were isn't it what it this is this is self-realization that even I came across when I was doing CFA that all my graduation I never knew this is going to be such an issue by calculating am and the difference is too much got it everyone and therefore GM is more reliable why how do we calculate geometric mean what is the first year return 100 so when you add 100 to 1 what will one become this is one plus hundred percent if it was one plus forty percent you do 1.4 right so one plus hundred percent which is one what will one become everyone forgot geometric mean two two and then the second year if I add 50 to 1 it becomes 1.5 but if I subtract fifty percent from one what will it become 0.5 0.5 so my calculation will be now the first year Factor will be 2 multiplied with the second year factor which is 0.5 everyone with me and this will be raised to raise 2 Guys how much in your one by two yeah one by two why because it is two years so guys in your calculator when you do 2 multiplied by 0.5 it is 1 and when you want to do raised to 2 you can go to Wi-Fi y x and do two inverse but but raised to 1 by 2 is also known as square root isn't it raised to 1 by 2 is also known as square root isn't it isn't it so is there a square root button in your calculator can you see that yeah so just press the square root button and then press minus one and into 100 you will get an answer zero got it foreign that we got over the two year period everyone yes so what is the conclusion what is more reliable when there is too much of variability in cash flow what is more reliable guys am or GM reliable right no GM 411 so just read the important points the more uncertain the returns the more the Divergence the more will be the Gap right between am and GM here you see M tells me 25 GM tells me zero percent and in fact in case of negative return or extreme returns what will happen am will give me an inaccurate answer compared to GM which will give me more accurate answer so whenever the returns are positive negative that means there is too much of deviation in the returns GM is more reliable than am geometric mean is more reliable than arithmetic mean clear everyone everyone everyone yes yes sir right so we discussed mean in that mean we know arithmetic and geometry which can be for a population or a sample arithmetic mean is simple average which you calculate year on year for a particular period geometric means compounded returns over a period of time am is always greater than GM am is always greater than GM am is equal to GM if the numbers are same if there is too much of Divergence or variability in the numbers then am will give you inaccurate reason GM will give a inaccurate results GM will give you more accurate results and therefore GM is more reliable than am everyone clear with this okay now yes uh actually while I was making this example where the stock actually went hundred percent up and 50 down do you think in real life something like this can happen double in one part and half in other part it actually happened recently yes and yes uh actually happened recently uh with an investor named Ryan Cohen uh this is an interesting case of Ryan Cohen I will share the link again sorry Ryan Cohen is actually uh he was the startup founder of a startup called Chewie anyone is aware about this startup no sir okay can you see my screen guys uh my browser anyway okay wait yeah let me know if you can see once let me know once you see can you see uh yeah so Ryan Cohen was the investor in a startup called chewy he founded the startup Chewie is basically a company for uh animal products animal food he sold that company Chewie at 3.5 billion dollars in 2017 he got how much our wealth 3.5 billion dollar in 2017. and in 2017 in 2017 he invested this entire money this entire money in in two stocks so all the well that he accumulated he invested that money into stocks and do you see what happened one of the stock went up from zero to hundred so hundred percent up and the other stock went fifty percent down can you see that on the screen yes one of the stock from 2006 so he split all his money between two stocks one of the stock went double the other stock went half everyone visible on the screen the chart yes sir so what did he make what was his overall returns can you tell me zero zero he did not he did not make anything but he did not also lose anything right and the stocks that he had invested was Apple which went double in that time period and the other stock which went half was Wells Fargo so of course in the year of pandemic which was the worst year pandemic was the worst year and after pandemic we had the best year in stock markets so in the year of pandemic he actually don't lose anything because his portfolio was zero zero one stock went up another stock went down right so this was an interesting story related to the example that uh we saw here and this had actually happened so uh that's the reason why I was sharing it now the conclusion is that this was a story about the investor I named was what was his name Ryan Gohan Ryan coin rhyme coin is the same investor when last year last year when he sold his money whatever money he got he invested in a company which is very famous now he invested in that company last year at the very rock bottom prices and the company's name was GameStop do you recollect anything about this company this is the same company we just created [Music] sorry the name of the company was GameStop GameStop what happened with GameStop recently on the Reddit it's yes somebody posts the stock went from ten dollars I think to 200 dollars yes and and it was taken over by that stock rally was by the retail investors the investors on the platform of Reddit right and hedge funds lost a lot of money because of the short squeeze there were so many media reports and articles and it was such a eventful thing GameStop and the one who invested in GameStop last year was the same guy Ryan coin so that was an interesting story with regards to this okay so everyone cleared with am and GM everyone yes right okay guys uh well fine so let's take a break of half an hour and we'll meet after the break so eat something and come after the break and in the last part of the session we'll do few more things and then I'll discuss that out okay