Transcript for:
Essential Business Math for Data Professionals

I am glad to announce the beginning of the new tutorial series called business math and statistics for data professional this series will be very helpful if you are targeting data analyst business analyst or a data scientist career I myself worked in data industry for more than 12 years and one thing I realize is that 80 percent of your data work can be done by 20 percent math and statistics see if you look at math and statistics it's a huge field you don't need to know everything you can just learn 20 of that and get 80 percent of your work done especially if you are targeting data analyst or business analyst career for data scientist career you need to know a little more advanced math now if you look at this diagram here what I have shown is the intersection between business math and statistics and business logic and by business logic what I mean really is this let's say you're working as a data analyst or data scientist in Expedia or rooms which are Hospitality domain companies then you need to have some domain understanding for example how hotel bookings work what is occupancy rate what is booking capacity all of that you need to know and that falls under business logic this is something that typically business managers would do but as a data professional you also need to have some understanding on the domain on the business side of the things then comes your mathematic and statistics skills now you can learn basic math and statistics such as some percentage variance standard deviation you know correlation mean mode median and these things should be sufficient and these two will combine to form something called business metrics or kpis by kpi I mean key performance indicator when you're looking at any business key performance indicators are Revenue profit you know if you're looking at Apple how many iPhone they sold last year what are my top five products who are my top 5 customers if you are in a service industry so these are all called kpis or business metrics and it is an intersection of business math statistics and business logic now if you add a third dimension to it which is a tool skills because see to do data analysis or business logic and business math is not enough you need to know some to such as Excel python R power bi and the intersection of this is basically your data analysis in order to do data analysis you need some tool which could be Excel python Etc but just by learning tools care you're not going to do data analysis this is the classical mistake that people do when I'm talking to any uh data aspirant I asked this question in last one month how much time you have spent in business logic or core skills such as communication presentation and usually the answer is close to zero people focus so much on tool but this diagram will make you realize the real data analysis can happen if you have the right combination of these three skills and in this video series in this tutorial series you will find a playlist Link in the video description below I am going to be teaching you all these three skills using very very simple examples so that even a dumb person should be able to understand these things easily as far as the tool is concerned we are going to use Microsoft Excel because that is the number one tool that people use when they are performing data analysis especially on smaller data set so believe it or not Microsoft Excel is still very very widely used and that is the reason I have created Microsoft Excel course which will be available on my website codebasics dot IO in March 2023 so check the description below uh you might find a course link there once it is launched and that course will contain lot of Premium material uh but I have taken some of the material from that course and designed this entire playlist so this playlist will have let's say 10 percent or five percent of the content from my premium Excel course okay now since we are using Excel as a tool skill uh in the next video we are going to look at some Excel Basics you might know some Excel already but I'm going to quickly cover some of the very very basic things in Excel and few Advanced things as well and then we will start performing uh some business related data analysis in the last video we said that as a tool we will be using Microsoft Excel for this entire series therefore I am going to be covering some Excel basics in this video I have Windows machine where I have Excel installed already so I'm going to launch that application directly if you don't have Microsoft Excel installed you can install it it's a paid software if you are a student they give some discount if your college has association with Microsoft uh you can also use the online version of Microsoft Excel where the Excel will run in the browser and that should work okay but I will leave it up to you on how you want to install Excel let's say you have it installed when you launch it let me go over this the tool and the anatomy of the application so at the top you see this menu bar where you see variety of options here what you are seeing is the spreadsheet okay and spreadsheet has rows and columns so one two three these are row numbers ABC these are columns and at the bottom you have sheet so you can have multiple sheets by the way okay so let's say you are trying to maintain your monthly home expenses in the Excel file that's a very very first basic use case you should on whenever you are learning except and let's say I am maintaining those expenses month by month so I will say may 2023 June 2023 for each month you can create a separate tab when we started my company athletic Technologies which is a software development company we used to maintain all the expenses in this kind of sheet you know month to month expenses for the company uh and here you can type in your expenses so you can say okay first me on May Ice spent on home rent I spend let's say 500 okay so this home rent is let's say an item and this is an amount this is the category because you you're going to categorize your expenses like rent and food and utilities and so on and date and in Excel you can drag and drop so when you drag and drop Excel is smarter and it will figure out that this is a date so it will increase the date okay but let's say uh for first May itself I have a few expenses so let's say I have five expenses for first May and those expenses I am going to just copy paste to save my time so for copy paste you can say Ctrl C Ctrl V I'm going to provide this Excel file check the video description down below then you can have okay let's say this is my second meal for second may I have once again four expenses and thirdly I have one expense okay so I have this other sheet and I'm just going to copy paste data so that it saves time on recording okay you love Samosa so there is Samosa shop nearby your home there is a tournament panipuri so you are eating all of that and um you're maintaining all your expenses here okay it is a common practice that the header you want to highlight in a special color so that you know it is a header and then if you use B it will make it bold so now my expense sheet looks little prettier all right you can also use conditional formatting where let's say if the expense is more than 100 you want to highlight it okay so let's do this you can select this column and you can say highlight and greater than so if this is greater than let's say 100 then I want to highlight it so now what happens is whenever your expense is more than some amount it will highlight it this kind of highlighting helps um as I said I have worked in a industry for many years and when I used to deal with this big Excel files especially at my stay at Bloomberg in New York office we used to get huge Excel files and if you have this kind of conditional formatting that will always help you have some data point which are extreme or which are which are creating some kind of trigger you can highlight those with certain numbers so that you can quickly look at them and do your analysis okay now let me show you some basic formulas in Excel okay so let's say you have all these expenses naturally the first operation you might want to do is you want to know your total expense how much I spent in total in month of May you can do that by just highlighting left Mouse click click here and at the bottom it will show you the sum 1010 or you can type in a formula and for formula what you need to do is you you can just say is equal to sum so sum is the formula so it is showing you in autocomplete you open a bracket and then left Mouse click you highlight all this cell you can type this thing manually also D2 column d11 bracket close and that shows you a sum okay so this is my total you can control scroll down to reduce the font size or you can reduce the fonts this is the not the font size this is really the view okay you can reduce font size from here um you can also do autosum by using this particular this particular option okay you can also sort the numbers by clicking any number here and you can hear say sort let's say smallest to largest and see it will sort it like that but if you want to sort by dates you click here and once again you can say newest to oldest or oldest to newest and it will sort things like that all right now each block the square that you are seeing is called cell so this thing is a cell this thing is a cell this whole thing is a spreadsheet uh and you can call it spreadsheet or sheet so sheet1 sheet two you can rename it and this whole thing the whole file that you're seeing is called workbook so the file you can say by saying this save okay so you can save to let's say this PC I want to save it on desktop I want to call it expenses expand sis okay so expenses dot xlsx is the extension for Excel file now uh when you have numbers you can do formatting let's say this amount is in dollars so if you click on it see it will show it and it is a dollar percentage also you can do but I don't want to have any formatting here so I'll just say general which is no formatting one other operation you might be interested in doing is you have this categories I want to know the total expense per category on food in the month of May how much money I spent on utilities how much money I spent so you can here create another kind of a table where you will say category and total in category you have food utilities Etc now if you don't want to write all of this you can use unique formula so you can say equal to unique and in the unique so see food utilities rent and if you want to sum them see I can use some I can use some correct but that's the total sum I want to sum only food items for that there is a function called sumif so In Sum if you will specify on which category you want to do that grouping so you first specify the category column then you want to say I want to filter only food okay food comma How do you want to sum so this is my sum range so in this range I want to sum them up so when you do this you get 190 so that is your total food expense you can control C control V here see by default it will change this thing so you can fix it B2 to B12 or b11 rather okay and the 2 to D 11. so utility this much expense and sum if once again this not this actually this comma this and this is the range okay if you want to verify that this total is correct or not you can do it using a filter so you can go to data click on filter now it's filtering per category so see for food on the right hand side you see 190 so I will say food okay food I have this much you click on it and at the bottom you see 190 so that 190 is verified similarly if you do utilities for example you see here you see 320 so when you click and remove it utility UC 320. so things are correct okay now these are just uh General cells uh in the Excel file there is something called table two and table will allow you to do some Advanced things so here whatever I have created it looks like a table but you see the table is pretty big right it's a whole sheet and in that you have bunch of numbers and text here but if you want a proper table you can select this and you can say insert table and just say okay now what happened is this thing become a table so when you highlight your mouse cursor here see at the top you don't see table design option but when you are here you will see this extra option called table design and when you click on it you can name the table so I am going to say name of this table is expenses and how does naming the table helps well now when you want to sum things up you don't need to say sum D2 to d11 because that's cryptic and you know tomorrow I enter a new row I have to modify the formula and all that now I can do something like expenses say because expenses is my table and when you put this bracket it will also show you the column name and you can hit tab bracket close now expenses now see this is much more readable than just saying amount or just saying D2 to D12 what does it mean need to to do D12 it's just random cells But Here There is a table name expenses and there is a column name amount so this looks much more readable also the other benefit is let me just put it here Ctrl X control V whenever you are adding a new row for example here let's say you right click and you say insert rows below and let's say on third may you incurred another foreign [Music] by the way um and I don't like all this pizza and burger all that I like to eat or Indian traditional food anyways when I inserted this new row it became part of the table see the it is highlighted so it became part of the table not only that the formula got modified the formula is now taking that into account formula is Same by the way but since the new row is part of the table it came here automatically so that's another benefit one more benefit is you select the table and you can just say insert slicer and you can say let's say you want to do category wise breakdown so you can just say category here and this slicer will allow you to filter so I want to now see only food items only rent only utilities see so you get all of these useful features if your cells are table if if you have the general this is not a table but if you have just if you click it see there is no table design but if I click here there's a table design you can insert slicer you can do so many things you can by the way insert total row as well so you don't need to even even type this in you have a total row by default you can set total some average countless I want to know count how many items are there 11 okay uh I want some here there are many more functions and you can do min max all basic arithmetic so these are the benefits of the table now that you have understood Excel Basics let me introduce movies data set and this is the data set we'll be using in many of the future videos in this playlist that's why I want to just give you a broad introduction on this particular data set here you see multiple sheets okay so first one is movies where we have movie name which industry Bollywood or Hollywood uh their Studio IMDb rating budget Revenue Etc and in the second one you have Financial so by the way if financials I have used some formulas and I have merged this table into here okay you can see that we have used X lookup and then you go active stable let me delete all these extra things uh movie actor languages and so on so movies has movie ID okay and that movie ID you can use to link with the financial table so let's say this kgf chapter 2 it is 101 so it is 101 the budget and revenue was 1 and 12.5 billions INR so we used 101 movie ID to pull this revenue and currency I'm not gonna go in X lookup formula but X lookup formula is something that you used to combine these two tables uh the information on this x lookup formula and all those details are available in my uh Excel course uh in terms of languages see movies have a language ID let's say three kgf chapter 2 is 3. so what is the three language kannada okay so that's a simple table then you have a movie actor so 101 is kgf2 the movie actors are 50 and 51 so what is 15 51 50s yes and Sanjay dutt so if you have seen kgf 2 movie you know yes and Sanjay dutt are the actors okay so these are the things we have let me delete it this is an extra thing so we don't need it here so these are the tables that we have now in the future videos we will be performing some data analysts on on this particular uh data set as well as we'll be using some other data sets too all right so keep on watching if you like this video please give it a thumbs up and share it with your friends who want to get into Data industry on this movie's data set now let's perform the basic business arithmetic here when you're looking at the number of this question that you have is what is my total budget or what is my total revenue and you can do that easily by doing some of movies and let's say budget and that is a total sum but is this number correct well it is not because these numbers are either in billions or millions and that currency is also different so you can't add numbers if they're on a different basis altogether so you have to standardize them to a common currency and a common unit then only you can sum them up so let's do that process right now so I'm going to delete this because this number is anywhere wrong I will do Ctrl scroll down so that the font size reduces and then I will create I will drag this and I will create okay Ctrl C Ctrl C I will create two new columns so I will call this budget in million so first I want to create a column where all numbers are in billions and why do I do that because when I look at this unit I have either billions or Millions so I can easily convert billions into millions and get all numbers in a single unit which is millions and I will do the same thing for Revenue so I will say Revenue million enter okay now how do you convert billions into millions so one billion is 1000 million so what you have to do here is use if formula and you can say if my unit is in billions so if this is in Millions then multiply the budget number with thousand okay otherwise whatever number is there skip it so now you see that I'll just do Ctrl score to increase the font size now you can see that the one billion number is converted to thousand but 200 million number remained as is so you can say that all these numbers are now in common unit which is millions and you can do the same thing for Revenue so here I'll say once again if the unit is billions then when you multiply it by thousand else Revenue brackets close complete 12.5 is converted to twelve thousand five hundred which is good but the 954 million dollar is as it is it did not change which is good okay now let's convert these numbers into common currency so I'm going to use INR so I will have budget million hours but I'm feeling lazy so I will not write million just understand that this is million so this will be budget and uh Revenue INR and let's convert this number into INR so if it is INR already you don't need to do anything but if it is USD then you need to multiply this number with USD to ini conversion rate right so let's do that so if my currency is USD then multiply this number with 80 80 is the current USD to INR rate it keeps on changing but I'm just going to estimate it to be 18. otherwise keep the budget in millions okay and do the same thing for this one as well so if my currency is equal to USD then multiply this number by 80 else keep this number done okay and you see that this is the number for kgf 112 500 it remains same because we are in INR but this number since it was in USD we multiplied by 8 80. so 200 into 80 is 16 000 and we did the same thing here okay now if you want to get the total budget and revenue in a single currency here you can do that so here you can just sum this entire this entire okay I'm just gonna do it some movies budget inner budget inner bracket was missing some movies Revenue anarch you see it's so easy actually so these are INR numbers so you can go here and you can select the linear currency so you can go to more accounting format scroll down scroll down and whenever you see INR symbol say English India so these are all INR numbers and what is this exactly so let me just Ctrl X control V this number is budget INR okay so budget kind of total budget you can highlight these cells as well in a spatial color and this is total revenue in INR and these are million so this many million INR it's a big number but yeah all these movies are doing so much business nowadays kgfrr if you're watching Bollywood movies you know they they make so much money and and by the way this is all okay so this is Hollywood and Bollywood both so we can understand movie industry right now they make a lot of money and if you want to do the same thing for let's say USD so it's a budget usde and revenue USD so what is budget USD so here you will say if the unit or the currency is in INR if the currency is in INR then you want to divide that number so you want to divide this number by 80. once again 80 is an average when I was at Bloomberg also uh you know when I worked at Bloomberg New York office uh we used to have these kind of issues with currency and many times we'll use the average of of a quarter let's say you are looking at earning estimates of any company and you have two cells which are in different currency uh you will take an average and sometimes the Brokers will provide the currencies portrait conversion rate for average like average let's say average of last three months or average of last whatever time period and they will provide that if you want or something real time you can call an API through Excel and get the real-time spot rates portrait is nothing but a currency conversion rate okay so you do that bracket complete and then once again if your currency is then Revenue divided by 80 else that and you can do quick verification so here budget in USD so budget million was this much so this divided by 8 will be this number and but for this number since it is in USD already uh you kept the number same all right and here you can just sum it up let's see if this formula is going to work yes it is working okay so you got all these numbers now let's do some other type of analysis so what are my total number of movies so I will say my total number of movies is equal to count count of all the movies so you can take any any column actually but I'm just gonna take movie ID okay and so my total number of movies is 39 okay and what are my Bollywood movies let's say I want to do some analysis on Bollywood movies such as what is my average Bollywood movie Revenue okay so let's do that so first you need to count total Bollywood movies so total number of Bollywood movies are count of movies but if you do count it it's not gonna work because you want to do if so there is a thing called countif where you specify the movie because you want to just you know if you specify the column on which you want to apply criteria in your criteria is I want to count only Bollywood movies which is 18. and what does total Bollywood movie Revenue in INR we can do INR USD but since this Bollywood will do Bollywood movie Revenue so what is my Bollywood movie Revenue obviously you have to use some if see here whatever you calculate in the yellow cell that was total all the movies Bollywood Hollywood here I want to only count um for the Bollywood movies so sum if what is my range so my range is industry okay on the industry you want to apply Bollywood criteria so in the industry column it will filter based on Bollywood text and then you want to sum how do you want to sum so in the movies you want to calculate the revenue so whatever Revenue INR you have you want to sum that up okay so this is my total Bollywood Revenue okay and what is my average Bollywood Revenue average revenue in INR once again this is Millennial right folks that is understood that is simple average between this number and that number and you can okay now one question that you might have is we said 18 but out of 18 movies there was I think a movie which which did not have any data not available is not available just say okay this is the total number and you can say average Bollywood Revenue percentage what is my percentage average Bollywood Revenue that would be okay so you have to set total Bollywood Revenue divide by total revenue from all the movies and if you can take percentage five percent okay it is quite less here so this number is basically your percentage Bollywood Revenue percentage Revenue from Bollywood with respect to all the movies and this is five percent because first of all our data set is not complete we don't have all the movies it's just a small sample and the second thing is you know this Hollywood movies let's say if I'm watching the Hollywood movie in us usually it earns more money because it's dollar and dollar gets converted to 80. so naturally Hollywood movies make more money all right that's all we had for this video in the next video we are going to cover basics of profit and loss statement we are going to discuss some of the important business metrics in this video and we will use Marvel Studios as our case study you all probably watch Marvel movies there into this business of making movies and it's a business so they will have their own profit and loss statement and when I say profit and loss statement what it means is uh they have to spend certain amount of money to make these movies which is called movie budget then they release the movies obviously you go watch movies and it will generate uh revenues for the business so that is for Revenue so there is a revenue then there is a cost which is your budget and if you subtract the expense from revenue what you get is profit so that's a basic uh definition of profit and loss if expense is more than Revenue then you get lost uh if it is less you get profit so if you Google Marvel profit and loss statement you will find various websites where you will see a complete statement here and here what this says is 66.53 was the total revenue this is probably either million or billion dollars and uh this uh is the you know March 10 as of 12 month time period so that's like the duration uh then you have expenses so from revenue you will subtract all the expenses the cost of material consumed Financial cost Etc and this is your total expense and since the expense is more than your Revenue you will have loss so see minus 16 and then there will be taxes and after you uh deduct taxes you'll get net profit so that's a detailed uh statement but what we are going to do is use a very very simple statement so we already have the movies data set with us okay so let's let's open that here and in this we have the information on Marvel movies here see Marvel studio so if you filter things by Marvel Studios you will get all the Marvel movies and we're going to create this Marvel financials where we will have marvelous Revenue then budget profit and loss profit and loss in percentage and so on so we are going to uh create all those metrics here okay so let's pull Marvel's Revenue so obviously you can use some formula and in some you can go to movies table and if you do Revenue USD you get total revenue so this is a total revenue which includes movies which are not from Marvel as well but you want to include only Marvel movies so we have looked at this before you need to use some if formula where you're saying if the movie studio is Marvel then only some uh the revenue so here's first you're specifying a range so you can go to movies and you can specify this range because uh okay I think Studio because you are filtering based on the studio so in some if the First Column is the column on which you're filtering okay so that is your studio and what is a filter criteria Marvel Studios see Marvel story you see Marvel Studios here so that's your filter criteria and after you have filtered what do you want to sum on well you want to make a summation on this thing which is revenue USD so you're going to select that okay and finish that and that's your total millions of dollar okay so this is nine billion dollar in total that's your Marvel's Revenue if you want to verify anything you can go to Studio here and just filter things out okay so I'm just gonna select Marvel here and I get some eight movies and if you just you know select these numbers you will get 9054 which means our number is correct see nine zero five four now you can do the same thing with budget 2. so I'll do control C control V and here instead of Revenue USD I will pass budget USD okay so you get this now folks just be careful when you are doing these formulas here right now I'm supplying movies Revenue USD uh if see in the last one I selected okay let me just first clear that filter because if you remember we have a filter on the studio so I'm just gonna clear that and then go back to this one and look at the third column see here when you select this you get movies Revenue USD but if you select the whole whole column including header you get movies all revenue USD in that case you can get a wrong number six one eight seven I actually don't know why this is happening but I have seen this wacky Behavior with Excel so make sure you're consistent if you have all here and all here it will work okay but you need to be consistent in terms of range and some range so I'm just going to delete it and just select select from the First Data row don't go from here otherwise it will include header so select from First Data row go all the way hit enter and then it works okay okay so my what is my profit profit is simple it is revenue minus budget that's business profit if my budget was more than Revenue then I would end up in a loss and in the financial statements they just say profit slash loss and whenever the number is negative it is common understanding that that means it is a loss okay now what is percentage profit loss well this number is on which Baseline should I use revenue is my baseline or budget is my Baseline well it's it's budget so it's like your Marvel movie producer you spend 100 million so that is your budget and let's say if you made a profit of 50 million dollar that's a 50 profit but that is always calculated with respect to budget therefore I will set this divide by this because that is your budget and if you convert this to a percentage which I can do by clicking here you get amazing 355 percent profit and this is based on the limited data that we have uh this is not the real Marvel profit yeah okay this is based on our data set if you want to look at real uh pns statement for Marvel you can google and find it out okay so that's my profit now I want to know market share so what is market share so market share is basically if total Hollywood movies revenue is let's say 100 million and Marvel is 20 million out of it then Marvel's share is 20 it's like a pie chart in this pie how much is the revenue from Marvel okay so in order to calculate market share you need to First calculate total revenue of all Hollywood movies okay and that you can calculate using Summit function once again so I can say sum if I don't need to even now look at the movies table because you know I know that it's it's called movies and when you are saying Hollywood you're basically referring to the industry column so in the industry column give me all the records which has Hollywood in it and then sum the revenue so revenue is present in which column this one correct so this is total Hollywood Revenue I should say total Hollywood Avenue and my market share would be what is my marvelous total revenue marvelous total revenue is this and what is total Hollywood Revenue it is this and if you convert that to a percentage it is close to 49 I mean you can increase the decimal and you get the better number but it's okay I don't want to have any decimal so total market share of Marvel in our data set okay not in reality in our data set is 49 percent which means 49 percent of the revenue generated by total Hollywood industry is coming from Marvel at 49 of debt total is coming from Marvel okay I hope folks just think about it this is common sense so you should be I think able to get this pretty easily so we can generate this inside that Marvel is almost half of the Hollywood Revenue I know in reality it is not but based on this data set see you're generating this this so this kind of uh insights where you need to learn how we can talk in business terms as a data analyst or as an Excel analyst when you're working with other other business stakeholders or don't use this number that okay Marvel's revenue is 35 percent you can say Marvel's revenue is almost one third of the total Hollywood industry or here Marvel's revenue is almost half of what is generated by the entire Hollywood industry so when you are using this kind of business language you can do effective communication so that's very important learning business communication where you put all your technical uh your code monkey behind that's very important skill to have in terms of market share I can go to Google and say India's Reliance jio versus Airtel Market sale so in in India there are these wireless companies ranjio Airtel and so on and see you see this nice picture so if you open this picture if you say right click open imagine new tab see this shows the market share so when they say the Reliance due is 53.92 percent market share now you get it so you have to take a revenue from Reliance you divide that by total Wireless industry Revenue in India and that that's when you get 53 percent Airtel is 26 percent which means if total Wireless industry in India is generating 100 rupees Revenue 26 rupees revenue is going to add and 53.92 rupees revenue is going to Reliance jio you can you can do similar analysis on uh you know zomato and swiggy market share these are like food delivery services in India and you will see various charts for example look at this one open image in new tab here it says eighty percent of online food delivery Market in India is covered by zomato in swiggy Combined 12 percent is ubereat and eight percent is others market share is something which is very important concept and when you go for interviews for data jobs they can ask these kind of questions so it's it's an important concept to learn okay now in every business there are always targets okay so let me talk about Target so if you're looking at you know any financial statement or any if you're doing any studies when I was at Bloomberg in Bloomberg uh I used to work in earning estimates team where every company would have their earning estimates and when the actual numbers come they will be different earning estimate is basically your target so let's say you're trying to hit 100 million Revenue but actually you hit in reality you get only 80 million Revenue which means you are 20 million dollar behind behind the target okay so let's say for my Marvel someone said a target of 8 000 and by the way I'm just going to convert this into just I'll put the dollar sign so that it's very clear and you know what I can I think I don't need to have to Hollywood Revenue here I'll just say Ctrl C and here I will use that formula that way it is more compact so let me just delete this row thank you okay and all these three rows also I want it to be in dollar and let's say my budget was 2000. okay and my profit goal out of my 2000 budget was let's say six thousand okay and my profit percentage was obviously this divided by this percentage okay and let's say I wanted to capture a total of 55 percent market share let's say if that was my goal okay so I can say 55 percent here so this is the target which is set by business in the beginning of the year before the business cycle starts they will say hey this year has started these are my targets okay in my own company right now I'm sitting in uh my own software company at Lake Technologies and we have business targets and when the year finishes we try to compare the targets with the actuals okay so here you will say Target minus actuals so what is actuals actuals is this number the reality and Target is whatever Target you said at the beginning so Target was this in reality you got this Target was this actually you know what in reality you hit this and this is my target so you achieved uh 1000 million dollar above your target that's what this means okay so your budget was this and your actual profit is this and your target was this so you achieved once again 1000 million dollar I can just change it so that these numbers are different okay and percentage-wise you achieved 80 percent more okay because that is your profit but market share wise um you achieved six percent less so your goal was to get 55 percent of the entire Market but you got actually 49 so it is six percent less but in terms of profit you achieved 80 percent more profit okay in terms of budget your budget was actually you know this this particular number should be because this is an expense we need to reverse it so I expected I would spend 2 000 but actually I spent 11 million or less so actually it's this positive thing I I I spent less money okay uh or I mean it depends how you are interpreting it by the way you can also do this one as well see whenever number is in bracket um it means it's negative so in any financial statement uh whenever you've seen any number in a packet that is negative you know that is just a general accounting a convention when it comes to companies financial statements so you have to remember that so here uh this this Factor Target minus actual it actually shows how you perform overall and sometimes you know the bonuses that people receive bonuses that sales managers and engineering and marketing team receives are actually based on this number you know are you achieving your targets if you're achieving your targets how much above you are your target let's say I went one billion above my target which means they did good so they will Marvel Studios will distribute extra bonus to their employees actors and so on but let's say if they did badly and if this number was negative then they will do Evolution you know they'll ask questions why this happened our business did not meet the target what can we do better and so on you can put this into percentage and what this means is the difference that you've got on your Baseline Revenue number is let's say 11 percent dual person you know so you've got 12 percent surprise basically uh so that's that's the way uh to look at it um when I was at Bloomberg when companies meet or beat estimate there will be this surprise number surprise means how by percentage how much you're above so let's say if company's Revenue Target was 100 million dollar but they achieved 120 uh they're like 20 above the target so that would be that will be the surprise factor and the stock prices will move uh based on that number okay so this was a very very simple basic pns statement in the future videos we are going to look at much more detail and advanced statement now many times what happens is you want to do conditional formatting and whenever any of your target is let's say a less than 10 percent you want to highlight that so you can just say conditional formatting and you can say if this number is less than minus 10. then highlight it okay so if it is less than minus 10 percentage then highlighting so then you know on which metric you are doing horrible it's like your threshold it shouldn't go beyond this threshold so in terms of market share you were 13 less than what you uh actually set as your goal so in Excel files um I'm pretty sure when sales and marketing department in big companies they sit together they they would have these kind of Excel files where they are reviewing their targets and they would have this conditional formatting and they would be reviewing those uh you know red highlighted cells carefully all right that's all I had uh for this lecture in the next lecture we are going to look at the basic statistics now we will talk about basic statistics especially mean median mode remember that you can do 80 percent of your data analysis work using 20 basic statistics so you don't have to know Advanced topics such as hypothesis testing uh chi-square test all of those things are needed if you're targeting data scientist career but if you're targeting data analyst or Excel analyst career knowing just basic statistics is good enough it will get 80 percent of your work done so let's talk about mean and median first I will use an example let's say you want to open luxurious car showroom in your town now before opening a store what kind of analysis will you do well you will go to town you will try to measure income level of people and if people are earning a lot of money then only you can afford to have luxurious car showroom such as Mercedes or BMW so doing income analysis is a usual industry use case when you're opening a new store and let's say in my town there are these people let's say six people are living their monthly income is this many dollars you can take a simple average average you know right it's just a sum of all these numbers divide by the total count so here I summed all these numbers divided by 6 so total number of records are 6 and that is your average and average in mean are same thing okay so here the mean value is six to five zero and you will realize that this income is not huge enough that people will be of able to afford a luxurious cars so you will not open the showroom so this is called descriptive Analytics now what will happen is in your town there could be one very rich person let's say Elon Musk is living in your town let's say he's your neighbor well he earns freaking 10 million dollar a month actually he owns probably more but just giving the example in that case whenever you have extreme values like this the average will not represent the true picture because due to the presence of extreme values which are also known as outliers by the way here Elon Musk is an outlier because other people's income levels are five thousand six thousand maximum eight thousand this guy is earning 10 million dollar it's like too high so whenever you have extreme data points uh using average can be risky so what do you do then well just think about it uh one approach you can do is sort the numbers okay so from less number to more in ascending order and try to take the Middle Point so Middle Point is 7000 and that Middle Point is nothing but median folks so you can use median number uh to make your decision so once again I found seven thousand dollar it is not a very high income so I will not open my luxurious uh showroom car showroom here the number of data points are odd that's why middle number is easy what if I have even number of data points I have total eight data points now well then also it's easy so you draw a middle line here then you take those two values you you basically have two values in the middle and you just average it out so average of seven thousand and eight thousand is seven thousand five hundred and that is your median so see if median is very easy if you have odd number of data points it's a middle number if you have even number of data points you take an average of two middle numbers now let's look at mode so what is more okay when you want to go to a restaurant with your friends what do you do you will do a survey you'll ask okay do you want to eat South Indian do you want to eat Chinese do you want to eat Italian and you take uh sort of like a survey okay you ask all your friends where do they want to go and then whatever is the maximum count you go with that so here see three people want to go to Mexican Mexican restaurant so that will be the decision and that is your mode mode is nothing but a frequently occurring value in a data set see more is just some kind of jargon you take any data set and whatever value is uh occurring most number of times that is your mode they use more than in a way in a voting also right if three candidates are in election then you do voting or get the maximum votes they get elected so that is your mood all right now let's look at our data set and try to use all of this in our uh Excel file so I have this Excel file and I have this IMDb rating which is the rating of all the movies and I want to now create a new sheet here okay so that's your new sheet fine and in this new sheet I can have mean median and more and mean for mean the function that you use is average so you use average the name of my movies table is movies and that I am DB rating so my average IMDb rating is 7.94 you can use this percentage here similarly median there is a function median once again movies is my table name and then I am DB rating so my median is 8.1 which means if I take all the ratings and kind of arrange them in ascending order in the middle you will find 8.1 and then mod is once again I will do movies IMDb rating so 8.4 is the most frequently occurring trade so you can go to movies data set you click here and you can sort these values uh let's say smallest to largest you get all these values and you'll realize 8.4 is appearing four times say 8.5 is 2 times 8.6 so that is your mode mod not more mod okay so mean median mode we just cover Basics these are a very very easy concept folks so I hope it is clear in the next video we are going to cover few more uh statistics Concepts in this video we will understand variance and standard deviation using a very simple example the definition of variance is it measures how far each number is from each other in a given data set let me give you a simple example of two countries these are imaginary countries juganda and krans and you are doing study on the income level of people in these two countries usually countries have per capita income level which is an average income of a person in a given country and if that per capita income is higher it means countries economically doing better now in this two country let's say total eight people live and when you collected their income level they were like this and these are by the way thousand US dollars so nishit is earning 71 000 US dollar every year and the average income level in Uganda is 62 000 a year in France these numbers are these but the mean is same so the average income in both the countries is 62 000 is that an indication that both countries are doing similar in terms of their economic performance well if you look at the individual income level you get a different picture here the income levels are more or less the same so see 62 is an average and most of the people's income are around 62 like plus and minus five or ten thousand dollars but in France the income inequality is higher the data points are very much far apart so you can say that in France the variance is higher in Uganda the variance is lower now you got an idea okay variance is higher or lower but what is it is it 10 50 20 what is it that has to be some number so let me show you that number using Excel so here I have shown their income level I will calculate the average income by using the average formula okay so that's the average income in Uganda and France and now I will assign this a label number so see here what I have done is you can say mean what that will do is let me explain so but first trans mean so I have given them a label okay in using this label you can refer to that cell so to calculate how far an income is from each other you can first measure how far individual's income is from the mean okay so let's say there is this data point so you can measure difference between mean and this so that you can calculate by saying okay individual income minus this mean and this means is juganda mean okay so individual income minus Uganda mean so that's your difference and when you do this you get all these differences now one way of calculating variance that you can think about is okay you find out how far they are from mean and then sum them up but when you sum them up see you are getting 0 why because there is negative number and there is positive number so negative and positive is canceling out the effect so to tackle this problem you can maybe take absolute value so forget the negative sign you can just take absolute value but even if you take absolute value it will have some problems and I will discuss that problem in a minute or so but let's just use different approach which is you can square these numbers so when you square any number you get positive number as a result so you are canceling out that negative effect okay so I am going to square this number this is how you square it so numbers are square okay numbers are squared and then you can make a total of it so total of these square numbers is this the count of these numbers how many numbers are there total it is this many and then when you divide this total by count you get variance so this is what variance is okay let me highlight so the variance for Uganda income is 30. and when you do the same thing for this here okay qurans mean when you do this and then the square I'll explain why we do Square okay so just hold on and here when you do total it is the sum formula count is count once again this and the variance look at the variance here folks the variance is here is so huge it is 8.95 so when you're comparing these two numbers okay you get an idea on how far data points are so let's say someone comes to you and they don't give you all this information this simply make this statement that in Uganda the average income level is 62 000 but variance is 30. whereas in Clans average income level is 62 000 but the variance is 8.95 just by hearing this you can immediately say that in France the income inequality is higher and the data points are far apart and there is a lot of variance in the overall data set so now let me tell you why do we Square it why don't we just use absolute value so I asked this question to chat GPD and it said three reasons number one is obvious you want to make every number positive number two is very important you want to penalize the difference so if the difference is higher you want to penalize it and the way you can penalize It Is by squaring it so let's say you have a class in your school and let's say you go at nine o'clock so nine o'clock is the usual time but if you go at nine and five you are five minutes late so then teacher will punish you they will say make one round of our football ground that is your punishment but if you come at 9 10 you have to make five rounds so see as five five minutes increases your penalty increases substantially and if you come at 9 15 you have to make 20 rounds so nine five just one round nine ten five rounds and 9 15 20 rounds so as that deviation increases you want to penalize it and you want to increase the effect so that way you can get a better understanding of how far or how what is the spread in data points um okay and the third reason is calculus so calculus works better with the square numbers uh discussing calculus is beyond the scope of this tutorial but I think it is some point in my YouTube channel maybe I'll make a separate video on this but in differential calculus uh it will help if you have the square okay so that's the formula and when you look at this formula you know you think oh my God this is so complex but it's not right see here x i is each data point each data point is what 7162 extra you are subtracting mean from that data point that's what we did we took the individual leader Point subtracted the mean then we square that number so this is the square then we divide it by n which is C which is the count by the way this is a sum okay this is the sum so we summed it then divided it and that's when we got the variance so the formula is clear now the question is what does 13 895 mean does it mean eight ninety five thousand dollar Mi 895 000 apart from the average no that's not what it means you square the number so if you want to make any sense out of it you have to scale it down so what is the reverse of square tell me square root square root is the reverse of square so if I do square root of this number what I get is standard deviation so standard deviation is nothing but that okay so that is my standard deviation so my standard deviation is 5.5 here and same way STV Del P by P it means population okay population and mean okay I can just use a square root I'll show you the ready-made formula later on foreign so the standard deviation here is 30 close to 30. so what this means is in trans on average person's income is 30 000 plus and minus from the mean whereas in juvanda person's income is around 5000 plus and minus so that you can get some sense out of this number if you use standard deviation but for variance it doesn't give you that that kind of concrete information so standard deviation is nothing but the square root of variance and there are real life use cases of variance and standard deviation which is stock market volatility let's say you want to invest into stock Tata Motors and Tesla you want to actually invest in one stock but you are trying to figure out which stock is better you take last one month average of both of this stock and you find that both of these stocks were 100 on average so now they are same so does it mean I can invest in any stock no you have to look at the volatility in the stock how much stock is stable so let's say Tesla's stock one day is 100 second days 200 third is 34 days 200 is going up and down up and down up and down so their instability or volatility is higher in Tesla stock okay so it is short of like this see Tesla stock just imagine it's 100 average but one rate is this much then it is this much then it is this this month this much so although average price is same for test line Tata Motors you want to invest in Tata Motors because Tata Motors might be stable right 100 one day price is 100 then 100 into 99 104 Etc so various is used in stock market variance is also can be used in let's say you have a child and you want to admit them to a school now you have two schools daily public school and New York public school which school I should um you know send my child to well you can do similar analysis so just forget this is income level right let's say these are the scores of a student so let's say you want to admit your student in 10th Grade and you look at both the schools and their current 10th grade student performance let's say in their current Trend grade in a daily public school lesson is it where are all they are studying these are their percentage score their average percentage score is 62. in New York Public School these guys are studying but their average score is 62 percent both are same but look at their individual scores it is too volatile right like say Mohan and Ahmad are getting higher score maybe because individually they are good and their parents are taking good care of them maybe the school is not that great you know teachers are not that great but in in this case if everyone is getting good marks it means students are good but then there is something that it tells about the teaching quality in that school so you can make a decision by looking at the variance so send your kid to a school which has a low variance all right so now in our movies data set we'll quickly calculate variance and standard deviation of IMDb rating and we'll use the ready-made Excel formula so for variance the formula is VAR dot P so there is variance for the entire population which means all the data points there is variance for sample populance which means only few data points we are going to use all the data points so movies IMDb rating so variance is 1.4 and then standard deviation once again for the entire population is movies IMDb rating okay so on average my movie rating is 1.17 plus and minus you will see from the movies itself you can figure out see the ratings are kind of in a similar range between seven to nine there is just one outlier but otherwise they are kind of similar that's why the variance and standard deviation is lower all right that's all we had for this session I will see you in the next lecture where we will discuss correlation we will talk about correlation in this video on the screen I have an Excel file where I have shown the square foot of a property and the price now correlation tells you how related two variables are so here is my data set and two variables are square fit and price and you want to know how related they are now using common sense you can say they are closely related because as the square feet increases the property price will increase but let's approve this statistically using correlation formula and by the way this is a real data of rajajinagar area in Bangalore that I have received from this makan.com website which is a property website similar to Zillow in us and I got few data points here and that's what I have here and I'm going to now foreign relation here and the formula you can use is Corel where you specify both of your data point arrays and it tells you there is an 86 percent correlation if the correlation is one it means they are very closely related so the range is I think negative one to one zero means they are not related at all and negative one means they are negatively related we will look at negative correlation in detail but here it clearly shows 0.86 which is very close to 1 which means these two variables square feet and price are very much related and if you plot a scatter plot by going into insert click in this you will find that they are correlated using a linear equation so if you insert a straight line here say it will look something like this where the data points are mostly around that straight line so if there is a line and the data points are around that it means they are related so on the x-axis we have the square foot Y axis we have the price and this shows positive correlation where positive correlation means if one variable increases the other variable increases as well uh in the same direction and if one variable decreases of course the other variable decreases let's talk about circle area where I have the radius of a circle and you know the formula of the circles area which is pi R square so Pi is 22 by 7 into R square meaning Phi into Phi okay now tell me what will be the correlation between radius and area if you use common sense it is pretty straightforward okay so let's show this so correlation between this comma this close to 1 isn't it a common sense as the circle of the radius increases the area will increase as well let's look at Health Data now where I have people's height in centimeter and their fasting blood sugar level now tell me is there any correlation between people's height and their sugar level whether they will have diabetes or not is it dependent on height no correct heightened the diabetes Health level has no relation almost no relation there could be some relation I don't know I'm not a doctor but the common sense says no I mean if you are getting diabetes or not it is dependent on your genetics your eating habits your exercise Etc so let's see so I'm gonna use correlation here between this comma this and you find is this 0.2 there'll be some correlation obviously but see it's very like like far away from one and it is close to closer to zero which means these two variables are not related and if you plot a scatter plot of the same thing here you will find that data points are all over the place see there is no like linear equation that you can draw from here the other example I want to use is employee happiness now tell me what is the correlation between commute time and the employee happiness common sense says that if the person is commuting longer hours let's say this person is commuting two hour one way 120 Minutes is two hour one way he goes towards then he works and then he comes back two hours again he or she their happiness index is lower this is out of 10. so 2 out of 10 means employee is not happy the happiness as such depends on other factors as well such as salary what kind of work you are doing you know your relationship with your manager and your colleagues Etc but just to keep things simple let's say your happiness depends only on commute it's very clear see this person is having zero commit which means he's working from home and their happiness index is very high 9 out of ten this person is working 10 minutes for a happiness index is 8 out of 10. 64 out of 10. let's plot a correlation here between the two do you have any guess what will be with output pause the video and make a guess negative close to negative one okay so if you plot this here yeah once again I'm gonna upload a scatter plot see here you can almost draw a line straight line I would say and that data points are around that line only so that means there is a correlation but it's a negative correlation and then there are types of correlations such as strong positive correlation strong negative correlation you can just Google and read about them uh in real life the correlations can be helpful in a lot of scenarios such as investment let's say you want to invest 100 rupees how should you invest and you don't want to take bigger risk so then you need to keep a balance between let's say stocks and gold because stocks and gold seem to have a negative correlation so usually if talks are going up more people put money in stocks and the gold price goes down so if you go to any financial advisor they will say let's hedge it hedge meaning you want to you know put your money in a balanced way in both these assets so when the stock market is down gold my prices will be up and when stock market is up gold price is down so overall your portfolio will still perform so this is a very balanced portfolio if you want to take risk invest 100 in stock if you don't want to take any risk then invest 100 in gold and other uh let's say cash and bonds things like that it is called the debt instrument so when you look at your portfolio if you are not a big risk taker if you believe in a steady return you need to have negative correlation within the assets in your portfolio that is one use case there are many other use cases as well all right so that was about correlation we covered few statistics topics uh in this particular chapter and as I have said before learning is a continuous process and you can never say I have learned data analyst for data scientist skills you have to have this to student mindset and always whenever you're facing any new topic you know take your time put a pause on your work and then do some reading read statistics try it out in Excel python Etc and keep on improving your skills foreign thank you