Transcript for:
Lecture on AI, Digitalization, and Data

[Music] hey good evening Aron good evening all yeah morning okay okay couple of minutes we'll wait folks and then we can let couple of can join and then we can [Music] St uh Professor me meanwhile others join I have some good queries from the last class yes yes please definitely we can uh I don't you're asking me to start an AI companion I don't think it should be an issue I okay okay uh meeting question is on got it okay so Professor can I can I my hell yes please yeah yeah yeah yeah yeah so digital twin sir right regarding the digital twin right uh right know uh how I understand right that for for the digital twin we need sensor and then we need the data right so why do we need to generate the digital to we can just know directly use the data with the AI with the AI algorithm and predict I agree but you agree but the the advantage of digital twin is you are basically doing exactly what is happening in real time on the surface you're not predicting anything okay prediction is the second part okay one excellent question to bring up one second um okay what a digital twin initially does is basically replicates what's happening on the field at that point in real time okay now you can use AI to predict it but one advantage that digital twin has is basically to say why can't I just do it for example I gave you the Google Maps now why if I can create what is happening in real time in a digital way isn't that the best way to do it rather than predict it once using this I can take you know if my traffic increases by threefold what will happen I can use AI to predict it yes I agree what go ah sorry no no no problem so what is happening today why would I want you can predict it but the advantage of doing it in real time is I'm going to get the real time you know aspect of it using from a digital purpose that's what a digital ini then you obviously they use AI to do the other part okay okay so digital to regarding the map you know I understood that you know we can see see the streets and the other detail anything for that matter you can do it same thing for an airplane they're doing it for an airplane okay okay yeah digital t as a stand stand alone it's not a gen right it is just a no no no no no that is is not J J will talk today a little bit okay sir absolutely we'll talk about J today and other things okay since anyway you brought it up not that I want let's do this guys one more minute 9:05 we will start I generally give five minutes for people to basically this one give let's wait for one more minute and then we can start okay but in the meanwhile excellent good question you bring up and we will talk about it and um um admit let me now share my screen uh why can't I see share screen I can see it here sh it okay yeah one second one second sorry yeah he Richard hey guys okay um Can can everybody see my screen yes sir okay oh oh oh wait a minute wait a minute wait a minute yeah okay because lakmi wanted to join and I thought I clicked the wrong button said don't join okay luckily no no I sorry sometimes this things if you hit the wrong button one things happen okay let's start guys other keep so today I'm going to do two things obviously I'm going to do the lessons we'll talk about it but also I'm going to talk to you about the assignment one I'm not going to talk to you in the beginning I'm going to talk to you probably after the first hour not because I can't talk about in the beginning the reason I want to talk about in the first hour is let's say somebody is late by 30 minutes or something for whatever reason right even they can join in the discussion of the assignment and hence I'm going to spend after the first hour I'm going to spend on the assignment I'll talk about your first assignment and we'll go through I'll have a document I will go through through in detail about the document and then every class will spend a little bit time you know going through the document so that you get an idea of what to do and everything so let's talk about it I will talk about it at the at the first hour or maybe hour hour and a half later so that you know if somebody is late for whatever reason let them also join in that's it with that let's start today's lessons first of all good morning evening or afternoon guys depending upon the region in the world that you guys are dialing in from um so uh the question was in our last two classes what did we do the last two classes we sort of went through the I would say rather than calling it you know what should I say Industrial Revolution yes it will I would say it is basically technological leaps that you know the human generation went through starting from what I would say steam engine the spinning wheel to electricity ra basically finance and the sort of the new capitalism to chips to basically Ai and everything today what I'm going to do is going to spend a little bit of time and I know Dr Prof also spent but slightly differently going to spend a little today's class quite little bit of time talking about you know couple of things basic terminologies like digitization digitation what AI from what is linear regression and everything because you will have a class that goes in detail about it the reason I want to spend this amount of time little bit of time on this is because then when I talk about an basically talking about uh how do I select an AI project and everything you what are the things if I talk about you should sort of have an idea and hence I'm going to spend a little bit time talking about those things and are very very important we'll spend some time talking about it and then in the middle I'll talk about your first assignment and things like that okay now with that what I'm going to do is you know uh again couple of quotes a breakthrough in machine learning would be worth 10 Microsoft is what you know Bill Gates said is one of the most important thing Humanity working it's more profound than electricity or fire you know big statements right from you know very smart guys okay now one thing is for sure guys and I think I put up this you know and last time also maybe but there is no h what should I say technology without money so if you look at this you can see the Investments Global Investments are going out in AI particularly maybe I have the wrong slide I have another slide I think maybe I will maybe when I actually post this I also have for 2023 apologies I don't know why I didn't put 2023 here but it has gone up it's about 170 billion or80 billion so uh so you look at this it is a hockey stick which means to say there's a huge investment that's going on here now I this sort of talks about the Investments that are going about in AIA again htic there has been a couple of days or a couple of years where it made a difference for example in um what should I say uh uh oops in AI 2011 2012 was the first breakthrough in uh this one you can see 2021 was sort of a breakthrough again this two billion to 1 billion and then this is the covid period right so you see a little bit of you know then there is a breakthrough and there is a reason why and I think most of you can guess why it is you know this is like this and 202 2023 is a huge 2023 know 24 already it has crossed $8 billion in just three months so there has been a lot of the reason why I put that is money has to follow technology there is no technology without money like it or not questions comments the second thing I wanted to talk about here is where is this Investments going there are three areas where these Investments are happening one can anybody guess where the Investments are happening any thoughts where the Investments are happening as in country not country and areas where within a where are areas where the Investments are happening the hardware also availability like the GPU Hardware okay chip development chip development okay Human Resources uh okay Human Resources very good excellent where else okay so there are three areas where the amount is I'm sorry go ahead Visual and multimedia technology image media technology voice excellent so let me tell you there are three fundamental areas where it is happening and you're all of you are right I'm not denying what you're saying is wrong the first area where it is happening is in basically uh startups in general startups could be you know AI for finance gener AI for health carei for supply chain startups right they're happen the second so all where all this comes into picture chips technology Vision so if you're doing computer vision they are happening so basically big buckets I'm talking about the second area where the Investments are happening and this is also very important is in a development of this large models model development algorithm development right so people are coming up with these large models chat GPT Gemini there are lot of people are doing a lot of this you know diffusion models and know Sora whatever they''re coming up with latest and things like that it is amazingly you know good that is where the second area where the Investments are happening big buckets I'm talking about the third area where it is happening which I think will form the sort of the underlying Lara for know next four classes is people now a lot of these companies and in fact if you look at in most of the companies that you work for or you know you look at it data scientists you know form a very small piece of especially if you are a non tech company so for example let us say you know you're in a supply chain Finance or other companies right data science forms a small piece of that entire investment 80% of the company are non dat scientists one of the areas where companies are investing a lot of money is in basic education of these what should I say employees because end of the day what they want to do is says AI is going to be in every area that you're going to touch everybody will be using it some of them will be using it as a tool some of them will be using it as a development some of them will be using it for you know basically for actual creating Al so they want to make sure that every person in the what's your company understands what AI is and every leader understands what AI is and that's what we're going to spend a lot of times in the next six to eight classes basically how do I select an AI project how do I select the right people how do I that is what is going to be the FOC main one of the main focus of this program or I'm sorry of this course and we are going to do that and that's where the third area where the money is going in and that's a very very important area and you as leader should need to understand that with that you know basically I am going to go to the any other questions or comments before I go to the next slide this is an important slide maybe I've talked about it but I thought let me do this if you look at 30% of all General Publications Cas most of I think I'm sorry you had a question somebody had a question go ahead please sorry Professor this is Rahul here I just wanted to find out um what we can see is from your discussion the investment are major in term uh are major in terms of um new development activities but uh would Hardware development not be the biggest investment that is being made absolutely so when I say startups company all that comes under that bucket right either it could be new hardware new software when I say new algorithms all that comes into that basically creating of new all that comes into that bucket absolutely you're right understood thanks thanks so basically I gave you three areas you know startups who are specifying in know certain areas then you have this you know infrastructure and algorithm development and education right got it absolutely good question now if you look at you know generally another way and this is I'm telling you a very very important way for all of you to see whether how important companies are and how to track what the companies are interested in you see because it's very difficult see because what one of the thing Trends one of the things about Trends guys and this is a very important thing is every day you read something in the newspaper right or in some blog or in some oh this is going to change the world this is going to change the world this is going to change the world how do you know which is really the companies are interested right because when you ask company no company say I'm not working on this the people everybody say yeah we're looking into it we're working into it because nobody wants to sound that know we are left out right foro right it's a major foro right fear of missing out so every company say yeah working on but really really to see where they really seriously you know put the monies one thing is you know for example if you look at General Publications and if you look at it you know 30% of all the Publications in the last 10 years if you look at it right at least a five years to be honest are in you know in stem Publications I'm sorry when I say Publications I mean stem Publications obviously I don't mean non- stem Publications are in AI more importantly this is a good way to look at it these five companies and I'm sure to know everybody recognizes I don't Apple Google Amazon Facebook and Microsoft right has more patents more papers in Ai and generative AI compared to the top 10 top 100 US universities this is something that guys you can look it up online patents is a famous you know obviously right it's an open database you can look it up it's a very very important stats I might have told you earlier but I'm repeating it because that indicates that they're very serious about these Technologies if not they will never put money behind it so if there are lot of patents coming out in a particular area in a general area right that means you say they're working very seriously on it and they see a lot of Roi on it so it's a very important stats that all of you can you know sort of do so if you want to know what the trends are look into the patent database look into the Publications database right and you can actually find out that oh okay and if majority of these companies are working on that right then you will see that oh okay they are serious about it and the last one is basically if you look at it right uh what should I say uh number of machine learning sectors you know basically the number of patents and you know systems that are coming out if you look at it the number of models that have come out right and that's another one I just took 20 up to 22 but if you look industry is outpacing you know any Academy or anything that means they are heavily heavily interested you know in creating this because generally what happens whenever a part becomes theoretical right you see a lot of things coming from the industry whereas you know from the Academia and very little from the industry so for example if you look if you if algorithms if you look at it right initially every see every PhD student thinks he he has the better M strap right so he comes up with okay I found a little slightly better random forest or a slightly linear better lar regession model some slightly thinking I get a PhD but more is coming from what should I say industry versus Academia which means said these guys even that small gain that they are seeing or the things are seeing they see that's a huge enough for them to make a difference whatever the model is and that is what is another thing so these are sort of a Trends you need to sort of catch you know good Trends to say what is happening in the industry questions comments on the slide the last one is you know Noble laits if you look at it right most noble laits today are sort of data scientists in the sense they're using data sense for their research Okay the reason why we are doing this course is because of this and this is something we will spend quite a bit of time talking about it and everything huge amounts of data a lot of AI projects fail not because you know the technology is not there but AI projects fail because there is not a good coordination between the domain and the data scientist as a leader this is a very very very important thing for you to understand what is it that you need to do as a leader in terms of identifying the right project and identifying the right teams identifying the right things and this is what and what is your role not just your role both as a leader or as a person with you know as a domain expert what is your role if you're a data scientist what is your we'll talk about it throughout you know in the next four to five classes in order to do that the first thing I want to spend is and I know all of you know about it but a little bit time talking about the terminology piece of it the first piece of terminology I want to talk about and this is something you know 2023 I gave this and you can talk about it 2024 if you look at this right this is sort of a Gartner thing you know that talks about the impact and trends that are happening and you know same thing I think you know gner is coming out I think you know they come with came out I think March or April they come out with it prints that are going to happen in 2023 the one in the middle right the L orange one is basically now it is happening now the one if you go basically as you can read the outer you go the the whitish one is something you know you can say one to three years and three to six years and so on and so forth if you see what's happening in the edge as you rightfully said Foundation models Vision Edge AI all of it we will talk about it but from a in from a leadership perspective what they are and then if you look at the outer one like multimodal UI what is multimodal UI multimodel UI means to say today if you look at chat GPT or any one of this right ear earlier I don't know how many of you have used chat GPT you could only give text today not just chat GPT a lot of these models you can give a picture and tell analyze a picture you can give a voice recording and say analyze this voice recording you can basically ask you can give give something in English and ask it to create a picture you can give something in know something and you know so multimodal means to say the same interface or the same what should I say input you can give either a voice video text and that is what is called multimodal digital twin we talked about it you know I don't want to I can spend the entire two and a half hours you know talking about each one of it you see responsible AI you will have an exper course called responsible a where they talk about how do you reduce Bas how do you reduce all that so there we have a course one of the modules is responsible that is saying you know what it's going to happen you know in the next 3 to six years in fact it's already happening but it is going to be very very thing similarly you know synthetic data what is that basically one of the other things and you know when we talk to when when I come to Genera I talk about it right very important and actually scary also part what people are saying is whatever has been produced until now by humans has all been used for training how do I train the future uh uh models where do I get the data to train the future models everything that has been written until now has already been trained what is being going to produce is you know not so much right compared to what has been done until now so they are creating synthetic data so now they're using generative models to create different forms of data which is called synthetic data so L of things you know we'll talk about it and and again I don't want to go into each one of them and talk about it you know because that it will be a huge thing but this is sort of gives you an idea of what is happening any questions or comments if you have any one particular topic you ask me if not guys I'm not going to spend too much time on this I just wanted to give you basically what is happening one quick question out of curiosity Professor what is neuromorphic Computing absolutely I talk about it and parag why did you say what is sabic where is the word sabic parak had a word it was on the previous slide oh sorry I'll remove it sorry sorry sorry sorry I will tell you why thank you for reminding it I didn't realize that it had the word sabic there I will tell you why I had the word sabic not here the last point no next next slide this one yeah our goal is to ensure this goes to a much better okay okay okay okay no no sabic is a and I'll tell you why iuse that give me one minute I will tell you not a big deal okay okay and I'm sorry I will talk about it definitely what somebody you asked me neuro what was it the question I'm sorry neuromorphic Computing right yes what is it gu can you guess um can you guess something I will tell you absolutely can you guess something related to neural network and trying to sim it uh absolutely so one of the things people are basic excellent good point so one of the things and obviously you can also Google right no big deal anybody can Google neuromorphic Computing and you can get it it's not like one of the things about neuromorphic Computing which people are talking about is see one of the things people are thinking about is you know today you know we talk about deep learning and you know we sort of trying to basically relate what our brain does one of the things that also people are thinking is can we read thoughts of humans something related to that is what is neuromorphic Computing is all about any other questions or comments okay oh one more sorry Professor uh one is modal compression yeah what is model compression model compression exactly excellent question hold on to that question until I come to generate and then I will talk about it is that okay with you not that I don't want to answer it today hold on question when I ask when I ask when I come to generate ask that question please again ask thank you I don't talk 99% I will not remember please ask that question and I will talk about it there is a reason why now there are these two words digitization and digital ization can anybody tell me what's the difference digitization is just converting your um non-digital asset into a digital format for example you know converting your documents into a soft copy and digitalization is converting that soft copy data into Data format that can be processed in some format or the other okay you're good close sort of in a way you're right but okay good excellent any other thoughts digitalization is all about entire process whether it is an industry or a company or a manufacturing you convert into a entire digital process so digitization is what we have been doing specifically yeah uh I think digitization and Samuel one one second guys I'm sorry apologies Samuel J when I come to J I'll answer your question please definitely I'll answer your questions I mean excellent question I'll answer it go ahead I'm sorry I apologize I didn't want to interrupt you digitization is what we have been already doing like what Rahul said uh the the first thing anything any physical things we just convert into digital whether it is artifact documents or computer programs so digitalization entire functioning whether it is an institution industry manufacturing so entirely in a way you are right in a way you're right right I'm not denying what you're saying somebody else was saying I'm sorry go ahead yeah digitization focuses on converting analog information into digital information and if we are going with the digitization basically it's about digital technology to improve process workflow and the business model OKAY anything else I was saying like digitization mostly focuses on converting and recording the data and digitalization is uh related to uh developing the uh processes and uh modifying the workflows uh around those processes to improve the manual systems involved in the process structures I think combined of all of you you've got the answer but let me explain no no all of you are right combine all of you I think you've got the answer we can give the very short answer digitization is the conversion process and digitization is the transformation that happens after exactly so okay so now let us take a small example let us say you go on a business trip I'm sure all of you must have gone on a business trip right one one time or the other in your career maybe you some of you are already right now when we speaking maybe you're on a business trip right you could be now what happens let's say you finish your business trip and come back to your office now all of us you know generally in most companies some companies may not be but most compies you have to turn in your expense report right so what happens in some company obviously there are different various one of the most common ways and I'm sure you may say my company does it slightly differently and I'm sure each company has this slight different most companies what happens there's a web interface where you go it may be like a form like where you create and you go type it in you know blah blah blah whatever right this is my bill know my this is my food expense this is my Transportation expense this is my this now and then you know what happens you basically press and enter button and automatic what happens that automatically goes into your manager and the manager approves it and then it goes to the finance team if the amount is large for whatever reason it may go to another second person if not it directly goes to your manager and you know your manager I'm sorry it goes to the finance team and the finance team approves it and depending upon which part of the world you are you know if your paycheck is twice every two time no twice a month then you get it twice a month if it is once a month you get it once a month whenever right you get your money now in this process you have to physically take the form from you to your manager get it approved and then again take the you know the form to your Finance it was automatically done that process was done but if you look at your entire what should I say process did your hotel receipts automatically show up on your uh uh this one what should I say your expense reports did your Uber bills automatically show up on your hotel bill did you for example let's say you took a you know I don't know a Metro or did you take a train you know for example you're traveling in Europe which is a very common thing you take these you know trains right to go from one country to another country or one city to another city those tickets that you purchased did they automatically show up on your expense report no right some of them may but not all of them automatically showed up you had to either scan it take a picture or whatever right do something about it in order to do it so digitization is basically automation are basically of one process and digitalization is basically automation of end to end of all the process so when people say basically company say I'm going digitalization that means you say they want to basically automate end to end process that is sort of it so all of you said some of you said you know one part of it some of you said end to some of them it's a culmination all of you are right in your own ways that's why I told if you put all your answers together you're right so that is the major difference from a digitization versus a digitalization questions thought okay now that this is has been set up let's talk about what are the technologies that are used in the digitalization process the first technology that I'm going to talk about if any of this digitalization process as all of you know requires collection of data data can be in multiple format data could be in your Excel sheets there but there is also also one way of collecting the data digitally one of the technologies that are used to collect data what should I say is called iot iot stands for Internet of Things Internet of Things is basically the way basically for example if you have a smart watch you you basically tell you how many steps you walk how many you know what is your pulse rate and automatically it gets stored in some cloud or something so you are digitally collecting data so that is the first thing is called iot now once you digitally collect the data what do you do next anal what stor analyze the data even before analyze the data what do you do yeah store it you store it absolutely so the where that's where the next technology comes in called cloud cloud is basically an option they can do multiple things but one of the pr AR things is I store data in a common place now Cloud can be internal or external what do I mean by that your own company can have a cloud technology or you can have your own cloud Network or it can be you can be used one of the publicly available Cloud basic as in Amazon AWS or it could be Azure or it could be Amazon and or it could be you know gcp which is the Google Cloud PL you can I'm just using the big ones you can use anybody so now you basic and some of them you can use what why lot of them use internal Cloud versus external Cloud because of regulations for example I don't know is anybody here from Saudi in this class or is anybody here from Saudi Saudi Arabia okay not necessar but you look at India itself yes Professor you are from sa you from Saudi Arabia right so Saudi Arabia also has one rule that says data of Saudi Arabia should be in Saudi Arabia you cannot cannot store the data outside Saudi Arabia so obviously you have to similarly in India there's a rule for Telecom data Indian Telecom data which is the human Telecom data has to be so what they do is so for example that in order to satisfy that rules so for example example let us say if I use Amazon in Saudi or let's say I use Amazon in Saudi what happens Amazon Cloud platform the backup for Saudi could be in Mumbai the backup for Mumbai could be in Singapore the backup for Singapore could be Netherlands the backup for Netherlands could be in San Francisco I'm just using five cities doesn't make a difference where physically it is right now what does a backup mean it's a complete replication of your entire process of entire data so obviously you know every data will be outside listed that's the whole point of having a backup right so that's the reason why what they're doing a lot of these companies in fact even in India what they're doing is and it's slowly happening in Saudi Arabia also in India what Amazon did and Microsoft and Amazon did and Azure did they started opening their own data centers in Mumbai and other cities in the world and they have said that you know you don't worry we will store it here with mult you know multiple backups within India but we will not you know put it onto the GL Global platform so that is how you know they're doing because of a lot of these you know regulations that are there why this regulations there's this little political strategic you can talk about it however you want to now that you store it the second one what you have to do is as you already I collect the data store the data what do I do next Pro process it C clean it absolutely absolutely AI ML and DS AI stands for artificial intelligence ml stands for machine learning DS stands for data sense what do they mean I'll talk about in a minute but exactly you process it you store it and you process it once you store it and process it okay what do you do next analytics that's what analytics is all about a ML and DS you process it you I'm sorry you're absolutely right you analyze it you store it clean it analyze it you do everything then what do you do next you use it for decision making visualize absolutely you use it for decision making excellent what is that called is the next technology is called b a r and v bi stands for business intelligence AR stands for augmented reality and VR stands for virtual reality now what is business intelligence business intelligence is representing data in either in a graphs in any form of table or you know gra whatever right this is what business intelligence all about I'm representing the output either the you know the original data or my analysis in a form that everybody can understand that is called business and there are tools for that for example you must have heard about powerbi you must have heard about tblo there are a lot of tools that do that now the second one that is basically called AR what is ar ar stands for augmented reality what does the word augment mean in English what does augment me add on add on exactly and sa exactly Muhammad said sa absolutely yeah assist addon exactly now obviously I'm from India so I'm a big Cricket fan and I'm sure lot of you may be may not be but I'm obviously I'm a big Cricket fan or maybe a lot of you soccer fans or maybe you a lot of you tennis fans or whatever right any sports you play any for example let us take soccer and Cricket you know maybe you I'm sure a lot of you might have seen a soccer game or you know a cricket on a television if you see that right the commentator will draw these yellow lines red lines and everything on the screen right or for example if you're in America football right if you play American football and if you watch that right they they draw the yellow lines hey first down this is the line for the first down this is the line for the second down you know what is the line of scrimmage all that you know basically they actually show it on the television right hey this is what happened or you know for example if uh you know receiver runs it in a zigzag when he receives the ball they can also show that right how the ball run you know basically do that and everything so there's a lot of things that you show on the television those things really doesn't exist on the field right but for the viewer you see that patterns that are happening right so that is called augmented reality the reality is the actual field the game you are doing something about that's called augmented reality virtual reality means you're virtually creating something for example if you have kids or maybe you also you individually play all these computer games right where you wear a headset and you start playing when you create that there's a lot of practical applications of virtual reality also so for example you know shell uses it to train their employees for oil spill there's a lot of you know Walmart uses it to train their employees for you know Thanksgiving basically crowd management during Thanksgiving so there's a lot of these things that actually they do so this is what in B AR and VR is now that I have presented the data what do I do next decision making decision has been done exactly and one of the forms of decision making is basically absolutely your right decision sort of one of the things that you can do is basically you can automate certain processes guys action this is another very important thing that action and you can automate certain process identif area of improvements you can say yeah exactly identify areas Improvement but most automate certain proc what I mean by exactly automation or automat robotic process automation or other tools which will be the yeah yeah yeah so so what I am talking about is so for example let us say you are working HR it doesn't have to be HR but I'm just H this constantly people are asking right what is my leave policy what is my policy for you know that so these questions what happens if you get it automatically you read it and you know the HR has to answer other than that if you can have a process that reads your email and automatically gives that answers and that technology is called RPA robotic process automation for example the best example is and nowadays you go to any website right the first thing that calls up hi my name is John or depending on the country right if you're in in US it says my name is John or whatever right the name is if you're in India it'll say hi my name is Ravi whatever right I'm just using some random names right whatever that name comes with and it how can I help you and then you start asking a question it give you some questions and it now ask some complicated things oh let me get my manager technically that's not a human that is talking it's a bot and then actually when it says let me get my manager is when you know uh basically you will see your what should I say um actually human talking that is called robotic process automation robotic process automation is that now that you know robotic process automation has happened the last part is very important is there are two other Technologies one is called cyber security you have to make sure that you know you secure the whole thing right make sure that your dat is secure so that your data doesn't leak or you know your credit card information doesn't get out if you're doing e-commerce whatever right and the last one is called blockchain what is blockchain is basically blockchain is a technology that lets you basically uh two things you do one if you basically say for example you know that you have done a certain things you can you know it will basically help you to say yes you have done that thing and verify it and second also it help you to sort of you know connect systems that are not different types so that is what blockchain does and so all the six or seven Technologies forms what is called a digitalization now in this digitalization you don't have to use all six or seven if the data is already in the cloud you may just use you know you may not use iot the second thing is if you're already you know you may not do any RPA but you may only do Cloud AIML and bi and you know cyber security so it doesn't mean all this technology has to be used for your thing but a a combination of this is what called a digitalization process questions comments and mostly today for all our classes we will be concentrating in the block called AI ML and DS but I just wanted to let you know that you know this doesn't exist in a vacuum there is something before it there is something after it so there is a process that happens end to end and that is something as you need to guys understand questions comments so so when you say that uh it is not existing in an vacuum it's very interesting to understand my question is uh like uh so it's always interrelated right it works in a simulation is it absolutely okay it works in what I'm sorry it works in what simulation with any of the other Technologies simulation s what was the s simul what I'm trying to but exactly what I meant to say is without the data AI cannot exist for the data you need to collect the data in one form of the other that's what I meant when I say it doesn't alone stand alone that's what I meant and now once I make some prediction how do I tell the word see as how do I convert that prediction into a business uh uh what should I say decision that is what AI B AR and VR helps me do it if not this AI is useless to me right I get some out outut not everybody is a data scientist to understand this means so you have to convert that into a business language that's what we will also talk about in the next six how do I first of all how do I convert a business problem into a data science problem and once I convert it into a data science problem how do I convert the result back into a business understanding right that's what this does and once I get the business understanding then I take certain decisions on it say hey these things you know what I can actually automate because you know this is a regular thing that's why RPA comes in right so that is what when I mean when I say it is stand alone yeah thank you sir thank you so much exactly anything else any other questions or comments okay now these three words let's talk about it the first word is called called statistics now statistics has been there since the day humans are there guys every day you guys do statistics whether you know it or not somebody asks you you know hey what is the average mileage on your car what do you do you basically fill up with you know so many gallons of gas I mean or so many liters of petrol and it goes so many miles or kilometers depending upon which country you are in so basically let's say you had filled up with 100 lers of petrol I'm s 100 May let's say 30 L of petrol for example or 20 L of petrol it goes probably 200 kilometers I'm just randomly saying it gives you 10 kilom per gallon how do you do per liter right 200 divided by 20 so or somebody ask you hey what is the average price of your house in your neighborhood how do you do you take the last 10 houses or next 100 houses sold and then you calculate that 100 houses and you know the price and you said this is the average so you have been doing static generally if you look at statistics the initial form of Statistics was basically to predict the past what do I mean by saying not predict the F basically analyze the past what do I mean by that so for example let us take I'm in a classroom in a classroom let us say I have 20 students and I basically take the height of all the 20 students now that I know the height of every 20 student what can I do I can calculate the highest height of the in the class the person with the the tallest height in the the shortest height the average height and all that that is basically analyzing data from the past do you guys agree basically I have a class I'm analyzing the data from the past now that same thing for example if I have to predict two things number one what is the height of the next person entering the class that is something I'm doing the future now based on my past data of the 20 students data I have collected can I predict the height of the next person entering the class that is future now in statistics yeah they started talking about linear regression and everything which you know basically I took over and did something but genderist when initially when it was in the 1800s when you look at it basically it was summarizing of the past from that what happened the next thing that basically happened was in the 1930s and 1940s the biggest thing that happened was the World War World World War right during that time there was this very famous gentleman and today in fact is a legend in Computing called Alan touring I'm sure all of you must heard the Turing model the touring machine and all that you know there's a lot of things to talk about so he was the first time who developed a machine that basically were able to decode the code of the Germans it was called Enigma the this one the mission that the Germans had developed it called Enigma it was a very difficult thing until that point all the coders we used to do coding you know basically who to basically do this way do pencil and paper that was the first time actually a mission was developed because it was so complicated that he was able to decode what and lot of them said that basically LED for the you know for the British and allies to win the war versus right or true I don't know but that they say a pretty huge you know because they now they were able to read every German Conversation every German order everything because they were using the same code they were using this code now until that point all missions if you look at it you know for example if you look at the Industrial Revolution if you look at if you that's bya the reason why we did the first two class first two industrial revolutions if you look at it right and then you know the 40s were sort of the you know the still in the Second Industrial Revolution itself you know not the third because the third sort of started in the 1970s look at it all missions that were invented if you look at it other than a few where to basically ease a human physical needs for example a car so that you don't have to walk power tools so that you don't have to physically think most of it yes Cal but they were all very limited this was the first time which you know s of it really really pushed Mission pushed the human intelligence sort of it was very you know it actually pushed like you know similar to human intelligence it did and so people thought oh my God in the next two years I can replace a human brain and so the word artificial intelligence came into picture and people thought oh I can replace a human in the next two to 10 years people try try try it's not easy to replace a human brain right human brain is pretty complicated whether we like it or not so then what happened in the 1980s they actually started analyzing and thinking about it that human brain has two components emotional and logical what is emotional emotional means to say basically if I suddenly let's a group of in in a in a classroom or in a group I start I shout the word fire there is no way I know how each one reacts or suddenly for example right the stock goes down if I have 10 people in the in the same room stock Brokers are humans regular humans we have no idea how 10 of them R let's say the stock suddenly one day Falls by 40% for whatever reason some of them may say great opportunity buy some of them say fantastic I will stay some of them say oh my God the world has fallen down and they sell that is why it is so difficult to you know mimic the stock market because that human part it's nearly impossible to say what anybody does and so they said let's not even worry about it in the 80s you only think about the logical part and see how about it at that point and we'll talk about it a little later maybe next slide or slide data itself is divided into two components structured data and unstructured data structured data means to say data that can be put in rows and columns so structured data is anything that can be put in rows and columns so for example your budget data can be put into rows and columns right or for example or your you know Mission data anything that can be put in rows and columns is called structured data anything that cannot be put in rows and columns is called unstructured Data so for example audio a voice file cannot be put into rows and columns a picture can be posted on an Excel sheet you can post a picture but it cannot be put into Excel you know rows and columns in the Raw format similarly uh you know text cannot be put into rows and columns anything that cannot be put in rows and columns is called structured unstructured data anything in a structure anything that can you put in rows and columns is called structured data in the 1980s they said let's not they didn't even know how to deal with unstructured data they only thought let's worry about structured data and so generally machine learn learning is the science of dealing analysis and you know making predictions un structured data now structured data started going large so people wanted Tools in order to manage the data so basically concepts of you know for example Big Data concept of you know Cloud all that came into picture how do I manage this huge amount of data data started growing in a very large and that is where data science is data science is nothing but bringing these engineering tools to do you know just machine learning on a large scale and in the late 2010 2011 2012 thanks to a gentleman called Jeffrey Hinton and we'll talk about it later people were able to see do to do really good how to and build models to analyze unstructured data how do I you know do images how do I do text how do I analyze voice and we'll talk about it how does it do at a very high level because you have a separate course called Deep learning it actually talks about these things so that is called Deep learning so basically that is the reason why this is you know basically this is what generally accepted terms what is machine learning what is data science what is deep learning now when I talk I also mixed sometimes I may use the word AI ML and D all of this does it but generally this is sort of accepted sort of a norm of what data science is what is machine learning and what is deep learning questions or comments I still have not answer the question what is regular Ai and generative AI I will answer that question later part today but at least any questions or comments on this Okay the reason you see this 2010 2011 and I will go back in a minute you see the reason why I told you that is you look at it here right 2011 is when it started forming that hockey stick I will tell you what happened in 2011 that changed the word deep learning came into picture and the model was um they changed the world and hence you see the Investments going up questions comments okay now that we have talked about this the next thing I wanted to talk about is and this is an important thing for all of you to understand guys maybe you've understood it maybe not what is so special about AI in general and then we'll talk about a little bit about it what is so special about AI right we have been talking the word ai ai ai ai what is so different about AI compared to your regular programming I'm assuming a lot of you here are programmers or maybe at least you know basics of programming maybe not all of you but some of you what is so special about AI that makes it so special let's talk about it in a ease of interaction e of interaction let's absolutely let's you don't actually I'm sorry go ahead so I was going to say um you don't actually program or this is more machine learning but you don't actually program um you tell you give the system the data and you tell it to come up with the algorithm absolutely so exent Point you're right exactly all of you guys are right let's take it let's take an example and let's F absolutely I'm not denying what you guys are saying is wrong absolutely right I'm not saying that let's take an example and figure this out let's take this word right hulet pecker I asked six people to write this word hulet pecker now why I chose holet peard and not D or some easy word is D four-letter word most people don't make mistakes right chances are so I took a word that was slightly complicated so people may make a mistake when they write this word I asked six people to write that word you is I'm sure most of you know it's a company right and it has many forms nowadays because they have sold off many divisions so I'm sure if I don't know if anyone of you are working for one of those avatars of HP but anyway they now it has a lot of avatars now if I ask six people to write it people write it this way in where you know in some cases there are the spellings here for example if you look at e e row e and row F the words I have purposefully turn it you know with spelling mistakes but if I ask a human any human right this all the six words I will guarantee most of you will guess this is H right even though I make a small mistakes your brain will still able to figure this out that this is a the word is H Packer H Packard right I don't think anybody will make a mistake now if I have to do this in a traditional program how do I do that in the traditional program what do I do I will have to actually exactly say this is the word h e w somebody write it exactly this way then it is H somebody writes it exactly this way it is H Packer somebody writes it this way is H Packer somebody writes it this way you know I have to do it six different variations I have to code and say if somebody writes it this way this way this way now let us say I go to the seventh person they may write it in a slight different modification of the six right they may make a slight modification now if I feed that modification to the actual software program the program says I do not know so imagine this and so each person may write it slightly differently so this is two words imagine the entire English language I took English as an example doesn't have to be English right imagine if you are French German Hindi Arabic whatever language you want right every so many permutations and combinations you do so your traditional programming fails utterly when you do this it's impossible to scale up right in a traditional program now how does AI do it as people have said what AI does it is AI will the way you do it in AI is you never give you give the actual logic you only give the inputs and the outputs and the algorithm basically deciphers the logic from the data that you have given so basically what you say was you give this as the input and you say the out this is HP this is HP this is HP you never tell how the word is written you just get this is what way it's written this is HP the word is written so it first tries to learn the logic once it learns the logic it tries to reason it out internally saying that you know what are the ways it has to understand the different patterns recognizes the patterns and then finally what happens in this Cas is the output is not whether the word is HP or not the output is a probability it may say 80% probable this word is HP 92% probability the word is HP 0 70% probability the word is HP 2% probability the word is HP so then if you give a word c here the output is it say 1% or it may be 0% let's not say zero let's say 1% whatever right it will say 1% now you as a human have to decide a cut off you would say that you know hey 80% or 85% is my cut off if the if anything higher than 85% I'm going to accept it as h p anything less than 85% I'm not going to accept it as he pay so if you would give a word cat it may say 1% you may say I don't want 1% obviously do I'm just using 85% as a cut off whatever that cut off it so that is the thing in a traditional programming you provide the input and the logic and the program gives you the output in AI you give the input and the output and the the model decides the logic that is the difference between Ai and traditional program questions comments so sir I think like uh What uh is being said is here like uh AI basically automat automates the uh complex processes and minimizes the downtime uh you know by predicting the yes it automates it absolutely it does that more than anything it creates the logic you don't have to give the logic absolutely so also the word which you used in the previous slides uh the term uh the word augment so I think AI augments the you know intelligence of uh human brain absolutely with Rich absolutely absolutely and it also you know it is sometimes scary the way it doents we talk about it I'll give you examples and I if I not today next class I'll give you some videos and it's amazing the way they've done it yes than so with that any other questions or comments folks on this slide Yeah question is sorry one quick question on the absolutely absolutely go ahead please we are saying here learning through training like Deering logic and technically pattern recognition is how AI is learning right so now let's go back and if we take example of algorithms like Define data science algorithm like gradient boosting or gradient descent so then technically when we are applying those algorithms that's not totally in the realm of AI right they are in realm of AI and when you talk about algorithms give me two minutes I not that I don't want to answer it two or three minutes I will exactly speak what are those models what are those algorithms with real example thank you sir absolutely anything else okay now that we have this now let's start with this guys this is the first basically you know of this now if you look at every problem in the industry now let us take for example you are all Bank employees for example I am a person who comes and applies for a loan what is the first thing do you do you're all blown officers I come and apply for a loan what is the first thing do you do you collect my data what do you do you collect my salary you collect my age you collect my credit score you collect my social security number if you in us or other countries in India we have an equivalent called Adar which is equivalent to your Social Security it's an individual number every person has and then no you do you do my credit reports all that you do what is the first decision you have to take as a loan officer on my loan what is the first decision you have to take approve or reject ex excellent whether I give a loan or not excellent the answer is yes or no do you agree with that yes AI can also do it and that is what we call prediction of a behavior basically the answer is yes or or no it doesn't have to be just yes or or no it can be multiple levels I can have a for example a a song I can call it as I can classify it as you know um uh what should I say classical rock and roll I can classify it as Jazz I can classify it as hip hop I can classify it at rap so it doesn't have to be one uh what should I say this one what should I say level it can multiple levels also but basically the answer is yes or no or multiple levels now if the answer is no what you do you get a nice letter from the bank says Dear Mr V thank you for applying for the loan unfortunately at this point we cannot you know do it hope you keep in touch in future we can do c blah blah blah thank you very much you get a Le if the answer is yes what is the second decision you have to take the limit absolutely the amount in the first case the answer was yes or no in the second case the answer is a number right so for example let us say you applied for $100,000 I'm just taking dollars whatever currency doesn't make a difference let's say you applied for $100,000 you may say that you know hey I only give you $70,000 because you have other loans or whatever based on your salaries or whatever or they may give you the entire $100,000 so it can also do that and it is called prediction of a value in first case prediction of a behavior second case prediction of value the third thing it can do is for example let us say you want to travel from point A to point B you want to drive from let's since I'm in in India I'm saying you want to drive from Bangalore to Chennai or Bangalore to Mumbai or let's say anywhere right you want to drive from San Francisco to LA or or you know from you know juil to RI or wherever right wherever you know or Shanghai to uh Beijing wherever you want to drive what do you do the first thing what do you do the moment you sit in a vehicle like all of us what do you do the first thing you obviously put on your seat belt start the car before you start the car you do Google Maps right if Google Maps is one of the most popular ones you put that in a directions and it will start driving it'll tell you you know take this route take this route it'll give you the entire route after 1 hour what it does suddenly it may say Hey by the by you know what um there's an accident on this road don't take this route or there's some delay in this route take another route then after some time it may say you know take this route so basically what it is doing it is optimizing your experience at every given point at different points because there is a change AI can do the same thing and we call it finding the best Arrangements it can optimize a process it can optimize a certain things experience whatever and finally one of the things that even in AI there are two types of question one is called supervised and one is called unsupervised what is supervised supervised means to say you have both the answer and the question what do I mean by that in my bank example I gave you how do you create a model for that what do you do is you go back to the the last five years of data you have given loans and you have rejected loans for many employ for many customers in in in in that in any bank right in your bank in my example you take the data and feed it to the model you say you know what hey these are all the parameters this was the age this was the salary this was the credit score all of this finally I gave the loan and this is what I gave second person I put in all the data my input I rejected the loan again third one I this was the loan requested this was the loan given this was the loan requested this was the loan given so I use all of that to train but in some cases what may happen is I may only have the input data I may not have the output result the example is for example let us say you have opened a brand new web store or okay now when every customer comes in you may collect the data of the customer saying you know what hey you may collect you know where are you from U you may also collect some personal datas you know what is your whatever you may collect some data from the customers but you have no idea what they're going to buy because you do not have a history it's a brand new store at that point all you can do is basically based on that you can sort of you know put to high value customer low value customer sort of but you will need some amount of data six months of data or something to say okay what they're buying and then you can predict if this type of customer comes this is what I need to basically advertise to this customer but the first six months you have no idea because you don't have previous data of what they're going to do so whenever you only have the input and no output we call it unsupervised when you have what the input the question and the answer you call it supervised and when you have unsupervised you can learn insights from the data so these are the four ma major things it can do now prediction of a behavior we call it classification in AIML prediction of a value we call it regression finding the best arrangement we call it optimization learning insights we call it clustering there's a word called anomaly I'll talk about it in a minute this is all the four things it can do questions comments classification and clustering they sounds a little bit similar can you please uh similar exactly in classification you have the question and the answer whereas in the clustering you only have the question you don't have the answer thank you thank you whether to I'm sorry you want me to repeat it oh no thank you thank you thank you thank you yeah yeah any other questions or comments okay now that see this you have a complete course called Deep learning and machine learning where you know you go a lot more deep into it but the reason I'm doing this is because the next part I'm going to do how do I convert a business problem into a data science problem these words I'll be using and if I do you don't know what this word means then you know it'll become a problem and hence I'm doing this so basically you have machine learning you have supervised learning unsupervised learning supervised learning means you have the question and answer and super means you just have the question you don't have the answer you have the input data you don't have output there you have classification or regression optimization the math Prim uh even one minute I will answer your question once one second let me finish one more thought and then I will answer your question let me just finish one more give me two minutes then I will please answer your question in in the math that is used here guys primarily are two things one is called linear algebra and calculus remember this word linear algebra what does linear algebra mean Matrix remember that because I will when I come to infrastructures and chips why Nvidia is so popular and why Nvidia became a bit trillion dollar company this very small thing called linear alra plays a very important role there and I will explain that when I come there these two play the fundamental role of you know basically the math is linear algebra and calc I mean complicated so when you went to high school you know when your your high school teacher said pay attention to your Matrix is this is because of this they one day you know you can do it but anyway this is what the math behind it mainly linear algebra and calculus very complicated math I'm not going to write math equations in this course so don't worry about it but it is pretty nice to know the math if you're interested it's very very fun and it it's amazing you know how simple things can give you fantastic results Anyway come back I'm sorry now you can ask me a question I'm sorry to interrupt you you know question no no absolutely go ahead absolutely go ahead I just wanted some I mean I had a doubt when you say there is input and output but there is in another case unsupervised there's only input and no output that I somehow not very clear to me okay Professor yeah what do I mean by so for ex excellent question during training I have an input and output in my supervised learning so let's take the bank example right I gave you I have last five years of data so when I use my word in training when I use it for training here when I use it for training here I will say customer One customer ventes three years back he took the loan forget the name right I put all the data of ventage his age his salary his credit score his um what should I say previous loans he had his bank balance anything I need to collect right and the amount he had requested and as my input and then I will tell the system this is what the loan I gave previously similarly I go to the next person the next person may be John I will put in all which is already four years back he has taken a loan I rejected his loan for example let's say for example I'll give you all the data and say I rejected the loan next person I will use it could be I don't know uh whatever it is right uh philli same thing I accepted he had asked for 200,000 I only gave 100,000 another person summer whoever it is right XYZ you know um Nancy she had asked for 300,000 I gave 300,000 so when I'm training I'm using the input and my past output what actually happened on my history whereas in unsupervised learning when I train I do not have my output I just give my inputs and hence I am basically able to classify them as yes uh Professor because output is not there because it has not been trained but you still need data over a there is no absolutely so I have to so when it is not been trained is not the question data is not available for output I see first time I'm opening a shop my first customer comes in I have no idea what my first customer is going to to buy right yes that's true but I can collect the data on the customer saying that you know what is your age I can do a survey on that customer right I'm just taking a simple example what is your age where do you come from what is your address what is your telephone number approximately tell me your salary approximately tell me you know all the data I can collect now based on the data I can make that customer for example you know high value customer low value customer or you know something like that but I still cannot say this person is going to buy this until about six months happens where six months I have enough data to say these types of customers came in this is what they bought after that I can say if this is a new type of customer who comes in I can say that you know chances are he or she will buy this so let me promote this to this customer that means for six months it will remain unsupervised learning and after six months it becomes supervised then automatically it becomes supervised something like that in this case exactly okay thank you so much yeah now it's clear to me thanks okay now that we have any other questions or comments now that it has thing is the next thing I want to spend a little bit time talking about is data now what is I want to time talking about it how data is generated and these two pictures you know sometimes pictures speak a thousand words they say right this is a picture guys in uh what should I say one second oops sorry oops I didn't want to do that uh what I meant to say was here okay 2005 people are waiting in Vatican City basically you see every so many years a new pope is nominated right because either the the old the previous Pope resigns or sometimes you know unfortunately passes away whatever you see people are waiting for a new pope to be nominated and I don't know whether you know this or not basically when it's a very very fascinating process us how a new pope is nominated irres of it let's not get into that and once the new pope is nominated here the white robe and that hat and that you know they come on to the balcony and wave right you know and then everybody's getting a picture this is what happened in 2005 the same thing repeated in 2013 2013 2005 what is the difference every body has a smart device cameras cameras whatever right everybody has absolutely everybody has a smart device you absolutely right cameras or iPads whatever basically it's a camera everybody has a smart device now the moment you click a picture what do you do you send it on WhatsApp you send it on Instagram you send it post it on Facebook you wherever right blah blah blah you send it on an email every time you do a digital transaction data is generated every time you do a digital transaction data is generated every time you buy something on the web data is generated every time you take an Uber ride data is generated every time you basically do a banking transaction data is generated every time you sell something data is transaction every time you use uh your credit card every time you book an airline ticket every time you do anything on the you know a digital thing data is created now that that the data is created I have this video video and I will play this video it's a very nice video it's slightly outdated in the sense few years back but and it has only grown just to tell the world how much of data is created on 60 seconds every 60 seconds globally how much of data is created listen to this this uh video and everybody hear right and see this right we can see but we can't hear no audio Professor audio no audio one second I thought there was audio there's no audio uh you can't hear anything very little right the volume right no no can be heard yeah one sec there's nothing it's just background music basically is it better okay this is what's happening every minute on the internet e the e so lot of data is created right I mean lot and lot of data is created so just to give you one minute this what happens on the internet imagine 24x7 when you know one part of the world is sleeping some other part of the world a v right so it's a 24 by7 processes lot of data is generated okay now once you generate the data from an A perspective what is it that you need to do and this is a very important thing that we will talk about let's do this now it's 10:22 I think it's hour and 20 minutes right can we take a 10 minute uh break now come back and the two things I'm going to do first thing is once you come back I'm going to talk to you about the first assignment and then we will continue how does that sound to everybody oh do you want me to continue not give a break I'm okay with that also I leave it to you what do you want to do now let's take a break Professor if you can okay perfect so it's 10:23 okay I'll apologize I'll give you like can you guys come back around 10:30 10:30 10:31 7 8 minutes is that okay with everybody sure sir sure thank you guys e e e e e e e e e e e e e e e e okay I am back let's wait for a couple of minutes and then start act folks yeah F uh oh can everybody hear me yes yes sir okay Arun you had asked a question sorry I did not uh see that question I will answer it how about reinforcement and semi supervised machine learning yeah we'll talk about it yeah sure Prof I didn't see that question apologize I'll answer it okay okay I think we can start right guys yeah let's start I think because this is important so I just want to make sure uh most of you are here uh looks like 50 good I think I hope everybody well like hope back on their chairs or desks or whatever right so I'm going to start basically talking about your assignment one you will have two assignments guys as we talked about in the first class any other technology the prisoners problem may use okay now what I want you to do and the May first is something that we will talk about and I will tell you why may first this can either be done in a Word document or on PowerPoint slides it is an individual thing and I will leave it to you what am I expecting slide number one I'm expecting you to clearly Define slide number one are clearly defined what the business problem is right slide number two I want you to Define how are you going to use AI so basically this is the problem and AI will help in the following way then I want you to basically ex explain these certain questions I want you to understand first one identify which bucket the problem belongs to you will not know what is a bucket you will learn that today so we will talk about it which bucket the problem belongs to we'll talk about it and you need to justify conduct a gain pain analysis if you have already done know what a gain pain is good if not tomorrow you will learn what is a gain pain analysis and we will talk about it so for your business problem I want you to answer these two which bucket and we will talk about the different buckets today second gain pain we'll talk about it tomorrow what what is the gain what is pain how do I do for the business problem that you have selected I don't care which area you select the problem it could be in healthare it could be in finance it could be in education it could be any business problem I do not care but you have to do that second third point is we will also talk about infrastructure and data in the following next week we'll talk about infrastructure data people I want you to deter where are you going to get the data what sources for your particular business problem that you have selected do you need Edge local server or Cloud where do you going to do the analysis what is Edge local server or Cloud we will talk about it but that is what I need you to say do you need big data Hadoop means to say here basically what it means to say is is your data Big Data solution or is it a regular data small data that you don't need a big data solution and what is the design level requirements for storage and computation how what are the you know I want you to say do I want so much gigabits of data what type of you know speed do I need and we'll talk about all those things when we come to that storage part of it and know probably next week I for your business problem I want you to say these things okay for your particular business problem that you have selected next as a leader I want you to create a checklist or a form that you want your teams to fill up to get approval on an MS project what are the criteria do you want to see when they select a business your team comes and say hey this is a problem I want to select I want you to create a checklist and say what do you expect from your team before they select a business problem right or an ml problem as a leader you need to have the clarity identify uncertainities I want you to identify what are the uncertainities in your uh business problem that you have and you define what is the criteria success for a POC proof of concept then finally I want you to create a doc deployment checklist with each one of this we will talk about in the next two to three next three to four classes what are the things you need to look for in deployment what are the things you need to look for in POC what are the things you need to look for in infrastru leure all that we'll talk about it but for your business problems I want you to Define all of these things and you could put it in a slide presentation or you can create a Word document I do not care this is what is your assignment one total 56 next week absolutely 56 slide is good enough guys I don't want don't make it into a book guys even Word document two to three pages is what I'm looking for five to six slides is also what I'm looking for in your PPD don't make it into a book because I have to read 70 remember that so please don't make it into a book so is it like every each one of this line item needs one slide or can we can Club multiple line item you I would say Club blue print one slide infrastructure and data one slide policy one slide or policy two slides deployment one slide you know what I'm trying to say so it become four to five slides right it might be six seven more 6 s or eight slide in that case yeah whatever right five to I would say maximum eight slides guys five to eight six to eight slides keep into that number right please don't make it into a very big thing the idea here is not Bing Big the idea is that idea should go into your mind what do I do that's what I'm looking here for slide number one Define the business problem and that one try to do it a little bit so because you won't be I can't ask any questions to you right you know what I mean to say I'll have to read it and understand the business problem any questions or comments now the due date I've given it as May 1st I will finish all the all this bab by April 15th so I've given you two more weeks to do this I think that should be good enough any questions are comes yeah profess number seven is it like related to the to the problem statement that we choose or it is an independent check no no no to the problem statement you choose no no in this is in general I'm sorry yeah this is in general Point number seven General exactly but but the check should also be if I apply the checklist to your problem sement it should pass be very careful you know what I'm trying to get got you thank you right you know what I'm trying to get here right yes yes Professor so don't create a checklist that anti to your own problem statement what I'm going to do is if I say okay this problem fantastic checklist now this problem statement will this problem statement satisfy this checklist no sorry Professor I'm missing something the the this is this one will be a separate exercise it's not a checklist that for this problem statement that I'm choosing no no it is a regular General exercise General checklist that you as a leader you want for any ml problem but what I'm going to do is I'm going to apply the checklist to your own problem and say whether it meets that criteria or not you create a checklist and you ask the right questions right what type of data I need and that da I'm going to see okay have you asked the right questions and as your problem that's it exactly create this is General checklist you create for any problem statement okay but this is all point nine is all part of the assignment one right exactly okay but this checklist can be a general checklist and and apply this checklist to your problem statement and see does it make sense you know what I'm tell you don't have to put it in the slide but mentally do that Professor this uh approval is like it's from some body that we need to take approval or let's say you not my approval your approval your own approval let's say you are leading a team of 10 people tomorrow somebody some in your team forget data scientist somebody says Hey I want to do a ml project you as a leader should think what are the criterias do I need to decide whether this is a good ml project or not okay so we should be in a position to know whether it is uh ml worthy uh project exactly and we will learn it so the next four to five classes we'll go through different forms of it in terms of infrastructure in terms of people policy in terms of all that we'll go through it so put all that together in your check I want to see how creative you can be from all the learnings that you have learned okay thank you so much and Professor again guys this is not I'm sorry go ahead so Professor that is same point point number eight and nine like you know all of those points are independent this identify uncertainity no exactly oh sorry identify uncertainities for your business problems not general you know what I mean yeah PC is also for your business problem for your for your business problem very good thank you that's what I meant but I isn't it the part of first two item like identify uncertainties for a business problem conduct a gain pain analysis kind of Rel no but why exactly but gain pain it is pain could be challenges but pain could be you know what length in identify it could sort of you you can but if there are certain uncertainties that you you have no control over I want you to identify here it could be part of the gain pain analysis also some of them can be bring brought up here also absolutely I agree with you gain pain is basically is else gain pain is about uh the benefits and risk analysis of the topic or is that pain when we talk about G we'll talk about gain pain we'll do that don't worry guys when we come to that tomorrow we'll talk about it okay we'll talk about the gain pins all of these points we'll talk about it there's nothing here that they will talk about it Richard wanted to know the rubrics rubrics is very simple folks for this I am not going to make it you know basically what I'm going to do is each one of this right for you know for blueprint I'm going to give you know for a basically I'm going to give it for 40 marks very simple 40% I'm going to give it to you for 40 marks 60 marks I'm going to give it to you for 60% that's why 100 is 100 so I don't need to do any extrapolation or conversion everything the way I am going to do that what should I say and I I'll give you that rubrics if you want next time basically what I'm trying to look here is I'm not see for me the marks you get is not I know you student are very important to you and we will give it to you as a student for me don't learn is important so the way I'm looking at it is basically for blueprint I may give you you know like 12 marks infrastructure something like that you know for each one of these I'll give you the rubric next time absolutely let's I will give you the rubric next time guys don't worry about it I will give you the rubric you know basically how I'm going to uh do this you know one more thing I really English is good basic English is good I'm not worried about you know the language but be clear in the language and be as much as possible your voice is gone Professor we use the do yeah for the last couple of seconds I was checking my internet yeah I think him either I lost him also I think we lost him yeah he Lo we lost him he's still talking it sound like it looks like he's talking correct but I can't hear him a p Frozen from my side and his screen is on not sharing anymore yeah I think he must have SW can you check with the professor he's back he's back your sir we can't hear you I think you you have to unmute sir so we cannot hear you sir now can you hear me now can you hear me yeah yeah we can hear you yeah we lost you when you were saying about language I don't about language specific that's what you said but nearly several sides back yeah so basically what I said was I'm sorry what I said was keep the language in such a way that you know I try to explain it in such a way that I can understand it so that I I don't have the luxury of asking you questions like a presentation so try to put in that shoes and try to do the slides or the word document that's what I meant on the similar project for previous module so say it again the similar project we have we have done a similar assignment for the previous module apply AI oh okay oh okay but now but did you answer the specific questions in the previous module yes it was quite similar we we we identify we identify area for our organization where uh AI can be used and what what is the ideal AI can be used used and we draft a business case uh that's true but that is the part but for me in that did you identify the policy that you have to create what type of data do you need what type of infrastructure do you need whether the solution is at the edge or in the cloud all that did you identify in the previous you might have I don't know I'm just asking you it was noted yeah it was not requested about infrastructure polic yeah they were not asked specifically so this is a different set of questions over here yeah yeah see the you see what I'm trying to say is you may use the same business case guys but I'm asking you specific questions here correct that's what I'm trying to get business Cas is not important okay we can build absolutely in fact I yes in fact that would be great if you can build up on that right that way it will be for you also it will be a continuation I go ahead I'm sorry so I was saying like it'll be a more in-depth of the same case study which we actually started for the last and we will be we will be deep diving in now because I think we are have almost started from the history historical background of AI and how we are progressing towards the modern advancement and Technologies exactly exactly and then you have to do a gain pain analysis which I don't did you do it last time I do not know if not you'll have to do it here we didn't we didn't that is a group working here we have to put the individual assignment this is the individual assign exactly and then also I want you to do basically to say you know what whether you know where will I want you to Deep dive because as Leaders see here is where I'm telling you I want you to know when do you want to put in the edge when do you want to put it in the local server I need you to sort of when you say in this answer whether you need Edge I just don't want you to write Edge but I want you to write one or two lines is why Edge can you hear me exactly so for each point can yeah exactly for each point write one or two lines right that's it that should be okay right not deep dive into it infrastructure assessment and all the good stuff please I do not have the time to read it but that's the reason why I told you but those two sentences should be the most important give me two good points so if I read it I understand it why did you choose this logic okay okay I have one question with don't because I like it to be in the edge don't say statements like I like it to be in the edge that doesn't mean anything you know what I mean to say right basically you required a architectural diagram with explanation of each one yeah yeah exactly for each one I need a framework you're creating a framework right high level that's what you looking at high level framework and that is what your will do you looking at exactly then I will ask you in the next assignment the I'll go into the next level lower level so you will start from the beginning and go down down down down so now you're understanding these are the things I need to consider from a AI project and none of it will be in terms of you know whether you know python coding or not none of these questions ask you mathematical questions or whether you know python coding or not okay so I have one question with respect to the policy component seven do you need an assessment a true assessment from a investment standpoint or this is a generic uh you know policy no no no from a leadership perspective let's say you are a leader leading a team okay a senior leader leading a team in your organization now you your team says four people says hey I have these four projects that we can use AI how do you decide which project I want you to create the checklist you want a portfolio project assessment mapped with the strategy of the organization of swimland something exactly or you don't worry about the strategy yeah exactly so you could basically one of the questions will be will this be be assigned to the overall goals of the organization if as please define I just need the questions I don't need the answers okay and seven to eight that should be enough right Max maximum guys please maximum please I request you because I have to read 70 that's the reason why see here the question and then what we can do is guys you in fact you know in the last we may pick one I will go to the next level okay anyway let's not talk about it let's know okay any questions or comments on this before go back to my PowerPoints okay good you pick the same question problem statement you did last time absolutely no issues but I'm assuming you did it in the group so four or five people will be picking the same problem statement try not to do that if you can pick individually it'll be good you know what I'm trying to say so this assignment one about for the individual assignment right it is not yes no there will be two assignments first assignment is individual second assignment is group okay for the second assignment U the metrix will be the same as the first assignment or it will be different Matrix as in so uh the evaluation criteria whatever you have mentioned like blueprint checklist and all those thing no it will be slight different I I will give it to you don't worry I will give it to you okay sure thank you it will be similar but not exact similar but not exact okay okay okay let me now go back to my oops okay okay now that we have done this okay uh 9 Minutes hours okay good we have an hour okay okay so one of the things very important when you do data guys is something called feature selection I don't know how much of you know people know about it what is feature selection feature selection means to say so for example let us say you're collecting a data for example let's say sales data of a grocery store now you have every day you have you know let's say there are 10 stores in your area you own 10 stores in your area you say store number one daily the total amount know basically it is the revenue total revenue was an ex uh $500 store one $700 store two $1,000 store I'm random numbers right you're doing 10 stores you collect it what do you made them to decide this answer so for example let's say for for example you're a doctor you're looking at the xray right to say whether you know uh this patient has uh I don't know T tuberculosis or not or cancer or not or something I'm just taking an example right now you give hundreds 20 10 10 10 10 xrays to or whatever right some MRI xray or whatever to doctors they look at it and say this person has cancer or this person has cancer this person of no cancer cancer cancer cancer cancer that is one one level next you go to the doctor and say doctor can you tell me how did you decide this person has cancer and this person has no cancer the doctor will tell you look at the X look at the dark formation here look at that you know look at I see the multiple formations I I see the X-ray you know here look at this other they give you four or five reasons why they came to that conclusion so what he did was he talked to the doctor and came up with what factors did they use to predict um the thing and in some cases they say what formula do you think you should use they may say you know what if there is a patch here plus I saw another report maybe I saw the blood work combination of this X-ray and the blood work made me predict that this person is potentially has cancer I'm just taking an example maybe it is totally wrong maybe the blood work and may be wrong I don't know but I'm just taking an example so so he asked them what factors did they use to determine this and then he actually put in those factors into a very simple AI model and you know what the answer was this model bet the prediction by 2 is to one so the idea here is basically selection of data and selection of features are very very important in your project and that is something you domain folks has to do so in your teams there are experts their role starts from their identification of the problem has to be from the business to not expect anybody else to think a lot of these places what they do is they expect the data scientist to identify the problem that is the worst thing you can do as a leader the job is data scientist is not to basically uh uh identify the problem the role of the data scientist is to execute the problem not identify the business problem the role the role of the business F guy or the business domain folks is to identify the problem tell what data you need for the problem and identify the factors that may be impacted for that basically and that is what is basically very very important and that has to come from the domain so domain has to work with the data scientist very closely lot of these exercises what they do they just throw the problem to the data scientist and say I here is yeah hey you go figure it out after 3 months I'll come back and tell me the answer it will fail miserably guys because you don't expect the data scientist to be the domain expert data scientist is expert in certain things certain things the domain has to do and this is a very very very important thing that you know you need to do next next question is when you start identifying the problem you need to know whether data what type of data do you need for this problem on an average data can be divided into three big Nano medium and Big Data Nano medium and Big Data what is nano data Nano in general is means to say generally it is said that and again you can Define it you know um anything Nano is basically anything less than generally 10,000 rows of data generally again you can argue to until the cows come home and I will not be able to defend it any scientifically to say I don't have a formula but any 10 thou generally anything 5,000 less of generally let's say 5,000 and less rows of data a very small amount of data you have is called Nano data it can be varying from 5,000 to sometimes to 10 rows of data and I give you examples anything up to probably 50,000 rows of data or 100,000 rows of data is called a medium anything above 100,000 rows of data is called a big data problem this is one of the questions I have asked for your business problem I want you to say what type of data do you have for your problem is it a Nano data problem medium data problem or a big data problem and you have to justify basically to say if you're taking a sales data you say what generally I have sales you know I will take the last three years of data generally for this bus problem I'll have you know uh 50,000 rows 80,000 rows because of give one or two lines of justification I want you to identify that is one of the questions I have asked in your assignment what is the size of your data Nano medium and big why I will explain to you each has its own CH it has its own advantages and disadvantages and I will talk about so Nano is it something the bucket size you are referring to exactly no no bucket size is what is exactly this is what it is in the dat I had it right which type of problem is it with me nano medium or big now Nano means to say sometimes it could be very small data it has its own challenges so for example let us take an example right let us say you are trying to predict a very very very rare form of genetic disease in the entire world there may be only 300 people who have that rare find of gentic disease now what you may do first of all you may not get all 300 let's say you only get 200 of them which are your most lucky right if you get 200 people identified with that disease go you're going to basically get their data for each person you may have 100,000 10,000 columns what do I mean you may collect every data possible with them those are columns what do I mean by that person one you may collect the blood sample their urine sample their you know whatever right stool that this genetic sampling everything each is one column right blood sample one col column in blood you may have 10,000 things you may collect on the blood sample right rbca whatever those are all column data for one person in the for the row one similarly but you still only have 200 rows whatever you do you may only have 200 rows because you were only able to get 200 people now that may not be enough for you to uh make uh what should I say predictions and one of the things that you you may not so sometimes what we have to do you may have to D what they do during the time they basically which now I will talk about how actually prediction is done at a very high level now what happens guys in a model is when you feed in the data let's say you feed in 100 rows of data or 10,000 rows of data doesn't make a difference what the model does is internally is it divides the the training the training data of thousand rows into two buckets one is called for training one is called for testing there is something called validation in the middle I will talk about it in the middle in a minute but let's assume it takes basically 80% of the data for training 20% of the data for testing this is historical data I'm talking about let's say you had in my example in the bank example in the last five years let's say you had about 5,000 row 10,000 rows of people whom you have given loans okay in the pre previously all thousand rows thousand people you have given roles are basically each row is one person you'll have thousand rows of data now in that thousand rows of data what you do is basically now in the Thousand rows of data the system when you feed it in it divides the Thousand rows into two categories training and test you basically have now 80% is used for training 20% is used for test what do I mean by that in the training it basically you feed in the input and the output what do you mean by input in input in my example was this is all the customer data his age his name his not name age salary credit score all of this did I give how much loan did I give you do that for 880,000 for 8,000 I'm assuming 800 rows let's assume th th000 rows of data for 800 rows you do both input output input output input output you train once the model gets trained what happen happens next the remaining 20% you only give the input and you ask the model to predict the output once the model predicts the output it will compare that output with the actual output is already there because this is all historical data right and that is how it determines the accuracy of the model I hope I am clear if not I can repeat it once again that is how it determines the accuracy of the model any question com was it clear what I said or do you want me to repeat it uh yes Professor repeat yeah basically let's assume you have thousand rows of historical data okay now out of the His Thousand rows the model randomly selects 80% of it for training 20% of it for testing this 80% of the data you feed in it'll take the input and output so it has to know this is the conditions where you gave the loan these are the conditions where you didn't give the loan after that 80% where it is enough trained and the model is trained then it'll take the remaining 20% Which is randomly selected and it'll only you give the input and ask the model to predict the output and compare that output with the real actual what happened then you know the accuracy if 8 out of 10 is correct the model is 80 % accuracy if 7 out of 10 is correct it has a 70% accuracy then you go and feed the real data and the sense future data thank you hope I hope I made myself clear uh yeah Professor one question here yes please yeah um in the case of uh Nano data so quity data is very ni and on top of that we are uh segregating the data if you do 8020 what is the CH exactly exactly I was about to come to that exactly so ask your question I will give you the answer yeah this was my question actually then how does the model actually absolutely so what happens is excellent question what happens there is so his question was uh already they only have 100 rows of data already if you only take it 80 80 rows of data for training and 20 rows for testing obviously the data is so less now you're only still making it less for training right it wases his question at that point what it does is basically hold on a second maybe okay what it does this is Imagine 100 rows of data right that 100 rows of data is divided into chunks of 20 20 20 20 20 five buckets it divides it into for example okay now in the first case what it does it takes the first four buckets of a testing the last bucket for first four buckets for training the last bucket for testing that is one model then it creates a next model what it does it takes the first 1 2 3 four five 1 two four five for testing uh for training and three for uh testing in the third one it'll do 2 three four five for training one for testing so basically it combines every permutation and combination it does basically it recy itself in different combinations that is one method second method there is always you can always the opportunity to create synthetic data especially when you have less amount of data There's an opportunity to create some amount especially in unstructured data there's a lot more opportunity to create synthetic data so combination of that you will use for Nano data excellent question any other questions are comments so so then technically when if we really have have to apply this in practice for the first use case we will apply an enfold approach for doing the paramet for Nano only for Nano data for other generally for medium and Big Data you don't need to do it you still can do it absolutely nobody stopping you but generally you don't do it but even if you do maximum for medium data for big data generally you have so much of data you really don't need to do it but there are certain things you have to be very careful when I select when I said this 8020 you have to be very careful three things you have to be careful about first thing not careful about two things see what happens is let's assume you and if you have thousand rows of data there are a couple of things that's going to happen guys number one there could be some data that is missing for whatever reason that day the data was not collected right especially it happens when you're doing Mission sensor reading or something for whatever reason the S sensor went bad the day or whatever reason the data was not collected there was gaps in the data right let's assume you had 100 50 Columns of data thousand rows 50 columns maybe you know third column second row the data was missing for whatever reason right now the data scientist is come to you I want the data you to tell the data scientist to come to you data scientist comes to you say hey this is missing data what do I do there are multiple ways of filling up this missing data one previous day data is today's data if you don't know anything just take yesterday's data as today's data right or two I can take the average of the last five days I can take the average of the last seven days there are multiple ways of filling the missing data but that decision how to fill should be from you that I want I want you to ask the data scientist explain what are my options then pick the right option for you and I'll tell you why that is important if you live it to the data scientist nine out of 10 times what they do they take yesterday's data as today's data because that's the easiest right why think that could lead to a small problem what was the small problem that can lead to you see what happens is there's another example that there's another situation that happens anytime you have a data right let's say you're collecting temperature sometimes for whatever reason you see generally the temperature is between let's say 20° and 30° whatever some process sometimes you may suddenly find 15° or you may find 37° now the data scientist comes back to you and say these are called outliers there is there's a very defined technique called outlier technique the data scientist will come back to you and say Hey by the by there is this data is outliers what should I do I want you to decide yeah see you know what I know that there is a problem sometimes the sensors go kaput so don't worry about it those are not real data you can ignore or you can say you know what because of certain changes in the process this can happen I want you to model with this data with the outliers and so you ask the data scientist to do it with the outliers or without the outliers do both but that decision you has to be from you if by chance if you don't tell what to fill the missing data the the of the what data scientist may just take the previous data it so may happen that previous data was an outlier previous data was outlayer right so he just filled up with today's data with an outlier it may be a rare condition but it may happen also right so that is why I want you guys to be very thoroughly involved in the process taking this critical decisions because these are business decisions that you know I'm just using process it could be process it could be Finance it could be supply chain whatever these decisions do not allow the data scientist to take ask the data scientist what are my options in order to fill my missing data he or she will tell you four or five options now you think just have conversation with them and say which gives you the best option now the third question that you asked what was the question did you ask uh I'm sorry I forgot the question what the question it was about the Nano data usage I think last one yeah yeah I asked about it right exactly whether to exactly absolutely right so how do I do for Nano data right the data points will be constrained yeah H so basically I told you right either you can there are methods basically it's called kfold it does kolding or it can you know you can create synthetic data but these are the CH these also the things that you need to consider how do I fill missing data how do I do you know this one what should I say outliers all of those points are very important for for you guys to understand the next thing that you also very important in data is see if public sources are available for you to compare if it is available in certain things I'll give you an example guys there was this disaster for Challenger right a few years back I think quite a few years back but a challenger you know that space shuttle blew up now the moment that space shuttle blew up the stock market if you look at the stock market there were four companies that stock just tanked the four companies were locked Martin marata Rockwell and Martin now if you look at it this is the today locked Martin is one company I think most of you know that that time when Challenger actually happened locked and Martin were actually two separate companies then they merged and that's why it became locked Martin so Martin marata and locket were two some individual companies so if you see right four compies because these four companies were individually mainly responsible for building the sh all four Bank you know because people didn't know what the problem was but very soon even before the final investigation was up three stocks came up and one actually started dropping and that was you know Marton why I don't know how many of you know why the Challenger failed but basically it was a $10 seal that cost a billion dollar project you see this seal actually had a rating from I don't remember the exact temperature guys but it had a rating from some temperature I think it was like 23° to some degrees 30° or something it used operating I do not know the exact temperature so I'm sure don't hold me responsible for the exact temperature the idea was since see I don't know in us all of these are launched from Cape canville in Florida so in Florida in the last 20 years the temperature had never gone below the lowest temperature which this could handle so they never bothered to worry about it on the day of the launch it went 2 de below the temperature and so that seal froze gave up and so it was a chain reaction finally fire entered the cockpit so $10 seal actually caused a billion dollar program so but this public data availability of public data Avail if there something public data available try to see how much see you can correlate it to whatever you're finding sometimes it's always good to see what is happening in the world versus your if you can not always can but if you can it's very good that if you can correlate with that the next thing that is talked about it but I still going to talk about it is I'm sorry you had a question sorry I was gonna say just a side note regarding yeah that that Morton there were engineers that told them uh this is just a side not absolutely that told them you're 100% right that it was to it was uh the the seals had a they they advised them not to Launch because the uh there's a there's a question about the Integrity of the seals at that temperature and they went on and overrode the uh whoever these management people are because they know you know you're talking about millions and millions of dollars if they delay or whatever and they went on and went on and with the launch anyway and then it exploded and those Engineers who told them not to launch you know of course they were right but then the whole cover up starts and that's that's kind of how that went but they were told not to absolutely yeah they were told not to launch by Engineers I'm sorry go ahead absolutely in fact there a very no no I'm sorry you go ahead please finish it no no no I'm I'm finished I was just saying that finish you're absolutely right Ronald absolutely right in fact yeah in fact there is a very good documentary on this I don't know if you if you just search on YouTube it's a very good documentary on this you're absolutely right the engineers told them guys don't do this don't do this delay by another two days or three days I don't remember the exact time they told they basically said once the temperature comes up you can launch it but as you rightfully said each day there were two things that I think that had happened one I don't know why there was this something that was on terms of you know they had to launch it because I don't know why the Soviet Union was trying to launch something something else I don't remember the second thing is as you right every day you delayed there was some millions of dollars of you know something they had had to pay so they said launch it that's correct and you and obviously what happened everybody knows right so it is a very very good documentary guys if you can watch it on YouTube I think it is on YouTube it should be free but try it out fantastic documentary that talks about the entire history and the cover up now we'll talk about structured and unstructured data what is structured data displayed in rows columns and relational databases you know requires less thing the most important thing and unstructured is something that can be not put for example images audio word word processing files emails Etc what I'm going to talk about today guys is um okay I okay now if you look at this estimated 20% of the data is Enterprise data is structured 80% is unstructured look at in your own company and tell me if this is true how much of information is there in your email exchanges and sometimes in your in your verbal in your where is this recorded will you use that in your analysis everyday analysis yes no hello can you guys hear me oh no okay I was for a second I was not sure okay can you please repeat the question the question was how much of information that you talk every day right in emails in you know some times in pictures and everything how many of that is actually stored in your sap data Bas for you to do the analysis so it's only the transactional data that gets stored right exactly absolutely that is absolutely that's the point right how much of information is there outside the transaction data that can be used for your analysis that we don't store right yeah yes professor and also there is some that confidentiality Factor which uh doesn't allow everything to be stored in that way huh exactly so that is what gter has come up with okay now what I'm going to do is um are we done with your time or do we have some more time guys I don't know from a Time perspective how are we doing minutes how how much do we have nine nine minutes oh 9 minutes okay uh n minut we actually actually we've got 18 minutes oh okay 18 minutes is good enough fantastic let's do 18 if it's okay with you can I extend the class for about 10 more minutes is that okay with everybody I apologize but can I extend it for 10 more minutes is that okay with everybody I'll finish this concept and end it if that's okay with everybody sorry about it guys I really but I think I want to stop stop it a logical conclusion that's the reason right okay let's do this then first C I have is now I will talk about the models now that we have talked about data we have talked about it I'm going to talk about the different models and I'm going to end it today the first I'm going to talk about is let's assume very simple case that you have two data points I have age of a machine the number of complaints it receives and then I am basically only these two I am going to my historical data says whether this machine failed or not failed so depending upon the age and the complaints all my white circles are my clear circles I'm sorry not white but clear circles tell that this is what the mission didn't fail these are the data points of my my darker circles are the points where my mission failed let's assume I have this is my input data the input is age and complaints are my input the answer is whether the machine failed or not failed this is historical data now I train let's say I had th data points or 5,000 data points doesn't make a difference I train my model my model gets my data and now it creates a mathematical equation such that that mathematical equation leads to me to draw a straight line a line that basically separates my dark circles with my clear circles if I have to create a line how will that line look like it's a very simple question it's not a trick question how will my line look like can somebody just explain to me how will the line look like just can somebody comment that separates my clear circles and my dark circles it will be a linear uh line exactly like this right yes can you okay please okay just remove your line Goku so that my slides are you're right just remove it from my screen thank you it will be like this right that is what a model is model is basically an mathematical equation for this line this type of model basically you're now this is a classification now I give an input to my new product I give age and complaint now I can say if the new data point is below my line I will say this this this machine will not fail if the new data point is above my Lane I will say this will fail so the new new data point where I need to need a production all I give is age and complaint based on my historical data and if it is the data point below this line I'll say it is not fail if it's above this line and say fail this type of model is called logistic regression in classific this a classification why fail not fail that output is fail not fail yes or no this type of is called logistic regression now the same thing I have I can divide it a little bit more complicated what I will do I will do it this way see instead of drawing one line I'm going to make it a little smaller chunks because I want it to be more precise now this is called decision tree now if I get a new data point if it is below this line left of this line it is becomes this right of this line above this line it becomes this fail or no fail that is another model it is called decision trees this is all under class classification I'm there are hundreds of models I'm not going to if I do that I spend the entire 15 classes doing only models I'm going to give you two or three examples of each categories the third category which I'm going to give you an example is instead of me drawing a line uh what I will do is I'll take a very simple case what I'm going to do is this is the one I need to predict my red one I'm going to look at all the data points around this uh red point if all of my data majority of my data point points are not fail I'm going to say this will also not fail if majority of my data points around are a fail I will say this data point is also going to faade the new one is also going to fail this is called basically K nearest neighbors where K has to be an odd number why democracy right if it's if it's not if it's six then if three are are not fail and three is fail around it then I can't say it has to be an odd number so the question is why so many models why which one works better which one is a good one which is a better one let me tell you a secret the data scientist also doesn't know he or she is going to run multiple models whichever gives you the best result he are going to say for this data set this is the best model it's very difficult to say which model gives you the best data now this same question was asked to me by one of the largest automobile Pro basically in CEOs one of the largest you know groups in India Mahindra right I don't know I don't know if anybody of you work for Tech Mahindra in general Mahindra group of Industries Mahindra group of Industry guys is a one of the large Industries in India they have basically from trucks to software to multiple things they have the CEO currently is Anand Mahindra so so one of the times we were talking with him in one of our a Consulting project and Anand was sitting there he said V said just you know giving me this coming up with this fancy name so that you can charge more for me I said no Anand that's not the point I asked him how did you start he said no I started in a Mahindra group started in a small company we're making one component and then we start so and then you know we grew so I said you started with one component or One Automobile and then grew grew grew now you make cars you make trucks you make you know SUVs you make all kinds of vehicles in addition to that you have a software division called Tech mahra you do all that today you go to Australia you buy some company you go to UK you sell some company now you come back to me and say I want to sell this company add this company it is not that you know models are becoming I'm not creating new models for the sake of creating new models the data is becoming so complicated I can't go back to my simple logistic regession every time because the dat has not so nicely divided into you know dark circles clear circles or fail or no fail it's becoming so complicated in order to address your complication of your data I need more models so that is remember one thing the reason why there are complicated models are coming up and better and better for example Char GP because the data is getting so complicated and in order to address it you need new models that is the reason why you're coming up with two new models now that you know you have this then in but one thing is for sure guys in real life I wish my data was nicely separated like this right all my you know clear circles on one side all my dark circle in real life it'll never happen like this right how is real life real life is like this where everything is mixed this is how you need to make the prediction this is the way the DAT see I'm showing it graphically I could have shown this in the mathematical equation you guys would have just told me take a height right what are you talking so hence i t it to you graphically so that you understand each of this graphic has a mathematical equation behind it but I'm just showing you graphically so that you can visualize what a model is a mathematical model on algorithm is in real life data is like this everything is mixed with each other I have to make a prediction on this so what we do we take small chunks of data sometimes make a prediction take another small chunk make a predictions take some Chun and that is what we call Ensemble model Ensemble me one model we put multiple models in order to make a prediction so for example adab boost gradient boing machines a lot of these models which you'll go in detail you will learn when you do deep learning and machine learning I'm not going to spend a lot of time explaining these models to you but just to give you what a model is what it can do what are the different types of models I just wanted to spend some time these are all the classification model models next to before I go into the next section any questions or comments on the classification models I hope graphically it was useful to you yeah Professor one quick question like um um when you say that even data scientists uh can't answer to the question that which model is the best now as a leader when I uh you know give a problem statement and the data scientist uh give comes to me with a solution or a model then how do I go about it in general like no no no exactly so when I said they don't know at the beginning they don't know which model works the best so they try out four or five models whichever models gives them the best accuracy that's the model they come back to you oh okay okay fine yeah got it that is what when I said the data scientist also doesn't know at the beginning of the problem he does he or she doesn't know which model will give them the best result so they try out eight models out of the eight one whichever gives the highest accuracy what they come back and say hey this model works for this problem statement okay got it Professor so in one sentence I can say that uh there is no magic formula to decide which for model would be the best one um yeah okay thank you so Shri Kant had a very good question why logistic regression is called regression and not a classification excellent question Shri and let me give you one small example the reason why it is called regression and it is sort of and you know sudas gave it a misn yes it's a misn but there's also some truth to it the truth is the output of a logistic regression is still like a mathem a straight line equation it's not a straight line but it's like an equation Y is equal to something a a X plus b x plus C or something and hence that is why it is called regression but yeah so that's exactly good question yeah that's what it is and we'll live sorry Professor the so the feature like if you say in the previous examples the feature those are the age and complaint and we can add on features related to that exactly age and complaint are the two features but I want you to add things that are not obvious so for example let me give you an example so I was about to give you that example an excellent question bring up let's take the sales data from 100 like 10 stores around I have I own 10 stores every day I total revenue for each store I have now I get the sales data that is one column right store one day one I have one day of all 10 stores day two day three day four day for one year so 365 days I have 10 stores so 3,560 data points I have right basically 100 10 rows uh 365 columns right each day I have total sales now that is one data that's one type of data part now I going look at this and say you know what Hey listen weekends I have more sales compared to my weekday so I can create a new feature called weekend new feature that didn't exist earlier right similarly I can say you know what during holidays so during Christmas diali or during you know Thanksgiving or whatever my sales are going up so I'm going to create the special special days then I see that you know at the beginning of the month or whenever I put sales or coupons I send out my sales are going up so I call coupon days so I create new features that didn't from the data points that will give me more Ines and more dimensions for the model to think more ideas for the models to think more patterns for the models to think those pack those kinds of things is something that you as a business folks have to think excellent point and Professor just to add on to that the question which I have so you initially told that the features has to be has to be decided by the domain folks right generally that right now the additional features that is a job of the data scientist right the hidden features no that is a job of the no unfortunate it could be a combination of data and the domain experts because the data scientist may not have that kind of indepth knowledge on the data right okay but they are the one will analyze the data right and will do absolutely they will analyze you give them the data they put it into the model they'll come back to you and say you know what this model works well then you can say add this they will do the tweaks to the model then they'll say this is an outlier point this is and this data is missing that all that they will say okay but original data the new features that you're creating that something from a domain perspective you understand right so for example you know that hey these are coupon days you can't expect the data scientist to know that you know you given coupons on these days right okay so I'm just taking a very simple example so that is part of it which I expect so that's the very closely the domain and the data scientist should work next I use the word anomaly first time what is anomaly let us take an example right let us take in whichever City you are in take a notepad and a pen and just go around the city and randomly ask people who are above the age of 18 okay just say above the age of 18 or above the age of 20 from random let's say above the age of 18 randomly go and ask them 10,000 people did you have heart attack in The Last 5 Years what do you think the data will be generally how do you think the data will look like majority know right it not all know but majority know do you agree with that yeah but after covid there are certain situations so it has raised recently exactly but even today if you randomly ask the majority will still be know right I'm not going go to any go go to a Heart Institute and ask that question yeah but randomly on the street some random 10 Street you pick a 50 Street pick randomly on the street go pick a person and ask them question even though as you said after Cod but still randomly I think majority will be know similarly for example if you look at a machine failure most machines don't fail in your processes a manufacturing guys why because people do regular maintenances on that whether you like it or not why because for them changing of $20 filter is cheaper than the machine going down whether that filter needs change or not somebody told them every two weeks change that filter they'll change that filter why it is cheaper to change a $20 filter than the machine going down right so whenever you have such a situation where overwhelmingly you have one side of data and the other side you do not have data how am I going to predict it because when I feed it this data to the model it overwhelmingly it learns a no rather than an yes in let's say no means did you have a heart attack no overwhel maybe there are three or four cases or five cases or six cases where you may say yes out of and it's a you know 394 cases where it is a no three cases a yes it's very difficult for the model to learn the patterns so whenever there is a majority in One Direction yes no or whatever right you use something called an anomaly detection what is an anomaly detection what you do in uh basically anomaly detection in uh AIS you try to create a boundary for All the known good points and say this is the boundary situation for all the good points now that you have a boundary whenever you get a new data point which you have to predict you try to find out whether the new data point is within this boundary or away from the boundary if it is within the boundary you say the probability of something happening is very less or and if it's away you say you give a probability how farther this point is from this boundary you say the probability is high that is in a very high level is what is called anomaly detection now Jack Cho says as a question no matter which method data quality is crucial how can we ensure appropriate data is available would alternate data work if the model to revise and accommodate see two things here data quality and data what should I say appropriate data now appropriate data I'm assuming Jack is basically what you're saying is do I have enough data right is that what you're saying when you say appropriate data or are you talking about the quality of the data the quality of the data not sufficiency exactly if the yeah exactly so if the data quality is extremely poor you know what there are if this real poor nobody can do anything about it right because there has to be some amount of good data right if not really poor garbage in garbage out you have to spend a lot of time cleaning the data and for the cleaning you need sometimes you can't even clean the data because if let's say a large amount of data is missing done if the data is wrong some domain has to say this data is wrong and what is the right answer so the data quality is really poor nobody can do anything about it what you can do exactly you can look for alternative sources so for example there could be some other databases publicly available databases you can train on that but that may not be on your data so how accurate it has its own challenges that's the reason why when you look at this large models like you know in the training they charge they train with so much of large data that there even some quantity of the data is bad that so much of good data will be available because the probability is high when large amount of data I take that you know you know lot of data is correct that the model learns from it so data quality is a very very critical uh factor in this especially if you're training with your own company data and if the data is so bad that you can't trust it you there are ways you can clean it to a point but after that the model can't do anything about it a good point but that's the true it is what good question okay the next two slides I will do an end of today's class I know I told you I'm going to take 10 more minutes the other one is predicting a value what we do we have all this blue data points we try to basically the model tries to create a equation a straight line equation that basically goes through most of the DAT that is closest to most of the data points not that it goes through all the data point but it is closest to most of the data points that is what it will try to create an equation and that is called linear regression and then once you get a mathematical equation You can predict that and then for optimization you basically create this you know you take each section and try try to optimize it and then take another section try to optimize it take another section so basically it creates this chunks ch ch of optimization that's how optimization is done and finally clustering is basically you know putting my input data into what should I say both uh what should I say good uh basically you know clustering into you know input data into either you know high value and low value customers so at a high level this is all what modeling it and all what I showed you is on structured data tomorrow I'm going to spend 1 hour explaining to unstructured data and this is where the fund starts and then J and then we will go about the Blueprinting and the real stuff of it any questions or comments thank you guys I hope it was interesting to you Professor like we we spoke about data like can also cover the the overfitting of the data with the large data if you know what I can definitely cover it you have a course called machine learning where they go in deep into all of these I am covering this so that you understand the uh the other the Blueprinting and other questions part of it for the project this is not the course to cover overfitting underfitting and everything not because I don't want to cover you have a course called machine learning where they cover all these things uh Satan okay completely they cover all these things and I know some of you may be thinking what is overfitting and underfitting you will learn guys sorry but you will definitely learn there this big data problems are they deep learning problems is Big Data problem absolutely and we will talk about that when we talk about it um when we talk about uh big data when we talk about infrastructure we'll talk about Big Data yes now you didn't talk about Quantum Computing we didn't talk about Quantum among other emerging Tech yes absolutely sidar Quantum Computing is is there uh when I talk about computing power in uh probably not tomorrow next Saturday I going to spend a little bit time talking about Computing Quantum Computing not much but we will talk about it yes excellent point anything else if not thanks guys really really thanks guys thank you very much for Shing thank you bye thank you everybody thank you Professor bye everyone thank youor bye thanks