um good evening everyone give me one second I'll just share my screen and then we will jump into the session all right let's see where we left off okay we'll start here all right so um we'll pick up where we left off yesterday um and U just to recap what we were talking about yesterday the first class that we had we took a historical view of the world and we saw the evolution of AI uh we looked at Milestones uh the C Milestones that AI has achieved over the last 30 audus um and in the first class we briefly talked about what the next few decades are likely to hold um for AI um yesterday's session was um um um uh yesterday's session was more about using data as the central lens to view uh the field of AI and what we basically said yesterday was that U there are different types of data formats and we can broadly bucket them into three right on the right side extreme is what we call structured data models um these are spreadsheets or Rel databases typically found in Enterprise data warehouses and if you hear terms like msql myql and Oracle these are typically holding relational data in them um then there is semi-structured data uh which is the dominant uh format of data when data has to be in transit across the web uh between a client machine like the one that you're likely using right now through a web browser and a server which is serving content when data has to be Transit um it is in a format called Json or XML or CSP and this kind of data has its own dedicated kind of stores like mongodb and couch DV and then there is unstructured data um which can further be broken down into things like text which can be in the form of U PDFs or images which can be the form of jpex or audio which can be the form of MP3 and so on so forth right uh that's the Spectrum we looked at we used this data format to kind of create a rough working definition of how we differentiate between data science and AI we said a good approximation can be that we can call data science that field where machine learning algorithms are applied to structure data which is towards the right end of the spectrum and um when we say AI uh we can restrict that term when machine learning algorithms are typically applied to the left side of this picture which is text image audio video and the justification we gave for this working definition was that AI by definition is something that trying to create uh recreate human intelligence in machines uh computers and algorithms and uh the definition of human intelligence uh fits nicely into the ability to talk to read to write to look at objects to view them to hear to speak and that is why these are the unstructured data models that we introduced and that's why we said AI can be thought of as working on the kind of data towards the left hand side of the screen um so that's what we talked about we saw an example of uh how the same information content can be modeled as unstructured data um as fully structured data and semi-structured data uh we went in detail about how uh analyzing structur data uh can help generate insights what are the applications of real world applications and where you should be thinking about using machine learning algorithms on structure data and do data science in your organization and we also looked about text right so if you have a text Data what are the kind of applications you can think about uh in the real world and potentially use in your Enterprise or organization um we want we're going to start today by talking about image and video data uh this field is called computer vision and sometimes it is called image processing but the dominant name of this whole field is computer vision right uh just like we had NLP natural language processing as the umbrella term which refers to the application of machine learning algorithms on Text data to create insights or to generate information that is called NLP uh today we will use the word computer vision right so the equivalent of NLP in the image and video world is called computer vision Vision um there's a question here from Yo saying referring to yesterday class on the trained models of each domain can we have a main system to coordinate these domain models take a very long while to train a model all these yeah so um if you're interested in the topic it's an interesting question youo um this is a field called mixture of experts right um I don't think we are going to cover it um I don't think we're going to cover it in any of the class it's pretty Cutting Edge uh it's called mixture of experts and the idea of a mixture of experts is pretty much along the lines what you said where you have multiple models trained on different subsets of data and these subsets can be datab based for example a model train on text another train on image another train on video or they could be domain based one one trained on legal another turn on trained on medical a third one trained on social media and then you can have L layer on top of it we depending on what the input is and depending on what the user is asking you kind of redirecting the query or the correspondence or um interaction with one of the models this is called mixture of experts uh even though open AI has not said it out in the open a lot of practitioners believe that gp4 is actually using a mixture of experts underneath the H so yeah that that is definitely possible okay um now carrying on uh and our discussion on computer vision right uh where you talk about image and video data uh let's see what let's first Define this space um so the definition of computer vision and again we will slowly go through this this first definition kind of sets the context of what this field does right it says interpret understand and make decisions based on visual data from the world right so the three words are interpret understand and make decisions and of course we'll see lots of examples of what that means right um as a first level of approximations here are here are the um seven most commonly used computer vision um applications in the real world right the first one is called image segmentation right so if you go out onto the street uh in your city and take a picture um the ability to draw a box around discrete objects right saying that this is one object and this is another object is called image segmentation right um now this comes very naturally to us right and we find it a trivial task from a human intelligence perspective U but it's it it's a fairly challenging task for an algorithm or a computer uh and a lot of work has been done on on image segmentation image segmentation as I said it's basically taking an image and saying this is a different object than this and these are different segments of the image is called image segmentation right um second is called object detection it's a slight slightly different thing than image segmentation where you are particularly saying here is an object I'm looking for so you can say I'm looking for a car and then uh the the algorithm runs and from an image it kind of finds uh it find of kind of finds a car right now a real world example of this would be let's assume that you have a drone and a drone is flying over a over a city uh uh let's say it's a main road and you the Drone has taken a capture of the main road and you can say I'm looking for an ambulance so an algorithmic way of looking for an ambulance in a drone image is a good example of object detection right so you give the input saying this is what I'm looking for and it goes and finds that object a facial recognition um I guess everybody on the call has used it um your laptop today or your mobile phones today have face detection technology and that's an application of computer vision uh Edge detection right right which is basically saying here is the boundary of object and I'm actually the algorithm kind of scarves out the edge of a human or a cat or a dog is is another interesting application patent detection patent detection basically says um the without the user having to prompt something you kind of automatically find patterns in the data um this for example can be useful in satellite imagery right you look at satellite imagery and U run run a pattern deduction algorithm which can say that this is an agricultural land this is industrial land and this is urban urban city that's a good example of pattern detection combined with Edge detection um image classification right so that basically means you you have different of images right let's suppose you have a brochure of products uh and you have images of so many different products that you sell let's take Ike Ike as an example um you have images of millions of products you run Al gorithm and you say I want all photographs taking an image and tell me is this a chair is this a bed is this a is this sofa uh that's an image classification problem um and then there is feature matching which basically says different images compared in terms of their underlying um visual features called feature matching um what are the multimodal what are the different multimodels used part of vision a sorry I didn't understand the question such you might want to reframe it um I didn't understand what you're asking there a question from cin saying when gp4 Vision analyzes the image of a chart and then interprets that what category that that does that fit into um so I can't make any comments about gp4 because it has not really uh released openi has not really released what gp4 does uh but something like what you're asking if there was a Model A hypothetical model that does that um the if if you if if it was a hypothetical model that did what you're saying celin it would actually not be a single model uh it will be a pipeline right something that we talked about most real world applications do not use one of these techniques they use a combination of techniques for example the first thing that you would need is for an algorithm to recognize that this is a chart so first thing that you would do is classification saying that um uh an algorithm which takes different kinds of inputs first needs to know whether the is a car or a chart or a truck or a plant or an animal so first thing that you will need to do is you will put in a chart a second level of classification can be okay what kind of chart is it is it pie chart is it histogram is it a bar graph that's a second classification problem right and once you have that in then you would kind of probably have a dedicated model uh which is able to uh map the visual content to Text data and then answer questions from that text data so very likely it's a pipeline that is doing what you are asking for ctly okay um that t could you please expand on the edge detection problem well the edge detection problem is um if if I again let's take an image of a street if you have an image of a street being unlike an image segmentation where you're drawing a bounding box which is square or rectangle The Edge detection is basically showing saying can you draw the boundary of the object in its native form right so an edge detection as you can imagine will be needed um for face recognition so you have to carve out that this is actually a face uh you are trying to match everyone's face with sa saying I meant a part from VC vgg16 what are the different models used in Vision um you will see that in U you are going to see those examples of different algorithms coming up soon in subsequent classes we'll talk about what are different algorithms available today we are just giving an overview of what is computer vision right um there are several algorithms uh used in computer vision AI right convolution neural networks for example is a whole field right capsule networks are other so uh the the best thing to do kind of look you it's a it's a jungle Zoo right so there are a lot of algorithms out there I'm not going to be talking about algorithms today the objective today is to kind of give you a flavor of what this field is uh question from from pratique how is object detection and Edge detection differ um okay I guess we are getting lost in the first slide because we are going too deep I will this is the last question I will take on this slide and then we'll move forward object detection is about you ask the question about looking for a particular object in the image for example you saying I want find me an ambulance in this image right so there's an input saying I'm looking for an ambulance Edge detection is unsupervised you can give it an image and the algorithm will draw edges along all the different objects it finds without necessarily understanding or mapping that this is a ambulance versus cat or dog all it is drawing is doing a boundary along the object and that is called Edge detection okay so so the objective of computer vision is to replace U the human visual again go back to the definition of AI uh the definition of AI is we are trying to replicate human intelligence we looked at NLP which is all text related intelligence of humans computer vision is Vision related um intelligence of um of computers so what we're trying to do here is replicate the human visual systems ability to perceive understand and interpret visual information right um as I said image processing is is a related field sometimes considered a sub field of computer vision which kind of starts at lower level um representation of images pixel level representations and and and manipulates the pixel level representations to enhance uh images so here are some examples now some real world examples so let's look at what is object detection or object recognition right so here is the problem statement you are given an image or you are given a frame from a video which is also an image and your objective is to identify and locate objects inside an image right so let's look at some examples so here's a typical example of a image or a frame of a video taken from a self-driving car right um and that's a very common use of object detection and recognition right you will see that it has done Edge detection so it has first uh used Ed Edge detection so it has carved out the edges which is what you can see visually in the picture and then on the right hand side it has done one more thing on top of just drawing edges it has called out different objects so it recognizes that this thing here is a car right the thing on the left at the top is a road sign this is a bridge right and here is a distant another truck coming from the other direction uh this is a road right so the object detection part is on the right hand side of the UI whereas The Edge detection is actually embedded inside the visual part right so one common use case of object detection and recognition is in self-driving cars to detect passengers to detect vehicles to detect coming obstacles um on the on the on the road right another interesting example is traffic management um you put these there are these cameras at Road Junctions right now and they basically can be used to detect if there's a congestion in any particular part of the town or the city or to for example run intelligent traffic signals which turn green only when enough people have stopped at a traffic signal right so that's traffic management video surveillance um which is both a security feature uh if you go to public uh buildings or if you go to malls and office complexes uh you have these videos uh which are typically recording the movement of both vehicles and people and you monitoring these public spaces and you can use it for crowd control Law and Order application retail analytics if you have a retail store or a shop or a mall um one one way this is used is you're recording all the video of customers moving around in your store uh you can use it for example to measure footfalls how many people come in on the weekend how many people come in on the weekday how many people can how many people come in how many customers walk in between 1 and on between 9:00 a.m. and 12:00 noon during lunchtime or in the evening hours you can also see how customers move inside the store and this can be used for optimizing which products to place where right Disaster Response is another great use case where you can fly drones over um disaster impacted areas to identify if there are survivors in a particular location uh look for damages and manage search and rescue operations uh environment monitoring um this is commonly done where cameras are installed in forests and Wildlife areas to monitor the movement of wildlife like elephants and tigers and lions um to track deforestation and if you do it in in a city environment then you can also use it for man monitoring sources of pollution right uh content retrieval image search is another great example this is how Google search Google image search works right um how do you kind of when you type in object name and Google pulls in images containing those objects underneath the hood U they have run an object detection algorithm and identified the different objects in that image so that when a user types an object name the image is pulled up right so that was an example of those are all examples of object detection and recognition um computer vision uh inside AI tasks right um here's another great set of applications called anomaly detection right uh here what is the problem statement again you're given an image or you're given a video which is a sequence of images or set of evenses and you're uh your objective is to identify unusual or unexpected objects or events right so here is an example here are actually eight examples so for example on the first picture on the LOF top a um is is an anomaly because you see a Truck moving on a foot path right and and if you can do an object detection but you need more than object restriction how does the algorithm or the machine know that this is not normal well think about it like this if you had hours of recording of this foot paath right and and you had never really seen a truck and you're comparing these millions of images or thousands of hours and a few frames stand out because they contain something that none of the other frames contain then the algorithm can flag it right so a Truck moving on the foot path is being flowed there's a pedestrian walking on the grass again this is rare right so the whole idea of anomaly is something that is not normal something which is not typical something that you don't see most of the times so again if you have thousands of hours of recording of this area most people would be walking around the concrete path once in a while you will see people moving across grass and um the algorithm can say well these frames are different from the rest of the frames because in these frames I see a human figure in on on grass and that's that's not normal right um here again you see somebody throwing an object in the air uh again millions of hours of recording this very few times you will actually see this so this can be flagged by the computer uh here's a person carrying a bag um which may not be normal in this scenario um here is an incorrect vehicle parking right you can imagine that most cars would be parked along the parking spots marked in green and um if you have lots of hours of videos and you see a few frames where the uh car is kind of cutting across these parking guidelines or parking Lanes then that can be flagged by Al Gotham saying this is not right um there you see people fighting hand to-hand combat pushing each other uh very rare and therefore can be anomalous and can be detected by algorithms here's a person catching a bag and here is a Vic moving on the foot path again right so all different examples of how um algorithms can be used for detecting anomaly anomaly basically just think of it as something which is not typical right whether it is dangerous whether it is a violation of law is not a statement that an algorithm is making all the machine learning algorithm is doing is saying hey this is not typical this is and and it might be interesting and then the human kind of says whether it's really interesting or not all the algorithm is saying is this is not normal this is anomalous right and that's where um certain applications in computer vision are powerful this is um a face recognition the definition of face recognition is you're given a face image or a video and you have to identify who the human is Right very simple problem statement lots of applications and very popular now um access control and security unlocking smartphones and devices law enforcement and surveillance identifying identity verification in financial transactions if you're going walking into a bank uh attendance tracking uh in factories uh employed and time attendance Management in offices airport security and border control right and social media tagging so lots of applications of face recognition falls under computer vision I'm sure everybody has used it right uh here is an interesting one this is called video summarization right kind of like text summarization but now with images so um the problem statement is you're given a video and your objective is to create a shorter information Rich video right um where would why would you do that um great applications in surveillance and security right if you think about it there are so many cameras in our world right now and they are generating such a large amount of video data that the number of humans that you need to kind of look at uh the whole video recording is just too many right so what you do is you put um AI algorithm or ml algorithm on top of it and say okay here is 24 hours of video can you create a short summary of 30 minutes right uh and uh you should and and your objective is uh what you want to see is of course not uh you want to see the most interesting 30 minutes right uh so how do you kind of create or derive those frames from 24-hour video uh which which you believe are interesting right so kind of like um you can think in in in in a mathematical sense similar to an normally because you're looking for information Rich frames which are but there is an angle of that the video needs to be flowing so you cannot just have information D frames there's an there an idea of flow here uh so to so that the user who is looking at it can actually review the events or identify if something went wrong um shots is becoming very popular on social media if you have used platforms like Instagram um and you can create these shots or these informative clips from long stories of from long media content right you don't need to manually edit it um you can use algorithms to create those shots from longer content um highlights from Sports and Live Events and games is a great example here is an example where um you have a live recording of somebody U doing paragliding right and as you can imagine uh if it's a 20 minute recording then a lot of it will look very similar and if you want to shrink a 20 minute video recording of somebody paragliding into a two-minute summary very likely you would expect it to capture the takeoff the landing and other interesting events which were significantly different so the way you do it is the video is basically a set of frames as you can see at the bottom and you use algorithms to identify which frames are interesting and then stitch them together um to ensure the fluidity and that's how you kind of create key highlights this is also the way to automatically create highlights for matches in your favorites boort soccer basketball Cricket what have you right you can create highlights editors and filmmakers use it to create trailers or at first cut of trailers of course you need human intelligence in that case because it's commercial uh consumer videos a lot of video streaming platforms autogenerate condensed previews for longer videos so that users can when they Mouse over they can see what this video contains and if they are really interested then they can go and watch the whole video right uh it's useful in emergency response right if there is a crisis then you want to quickly see what's going on and what kind of Emergency Response support is needed U maybe using uh cameras U on road traffic signals or using cameras on drones you want to analyze and see what is going wrong where so that you can manage uh the emergency response right so so that was on video let me take a pause here uh any questions of computer vision and its applications in the real world uh before I move on to the last type of data I want to talk about which is audio data okay so um there are no questions let's move on um audio data what do we mean by Audio analytics or putting ML and algorithms off audio audio well pretty much very similar definition uh you need to process interpret and extract meaning meaningful information from different types of audio signals right now first let's kind of clarify what do we mean by Audio the facto most people understand audio to mean spoken language like I'm speaking right now and hearing and that is absolutely fine but there are non-speech sounds for example example and we'll see some examples of it so you can have sounds which do not have speech uh and that has have interesting applications in a inm and of course music right music can be both with speech or be both with vocals and without vocals all of this falls under the domain of audio analytics right we talked about this yesterday briefly somebody mentioned um Alexa right Alexa or Siri uh just to call out Alexa or Siri these are called conversational voice based conversational AI systems right um and if you look at how the path Works uh speech recognition the first the first module on the left is called the speech interface and if I go left from right the first part is automatic speech recognition which falls under the domain of audio analytics right which kind of converts audio to text and then you kind of have a second model which is in the text domain which is natural language understanding right and then once you understand the language language then you go back to traditional apps or programs where based on what the user has said you can trigger um you can get the weather information you can get stock information um you can pull search results what have you and this comes back and there's a dialog manager which kind of puts natural language context of the previous dialogue that the user was having with the machine and then finally you send it back and there is text to speech so you convert text back to speech right so um this is another example of emphasizing that even even though we are covering each one of these modules separately so that you understand what they are real world solutions would actually Stitch many of these things together to to build a real world solution and what we are showing here is that you have the uh Speech interface where audio analytics algorithm will come in you have the NLP module in the NLP related modules in the middle and then traditional cloud-based API Services web services and the back end right so a great example of how different things come to together to kind of solve real world problems right okay so let's start with the first audio analytics problem this one is called speech recognition here is the problem statement you are given spoken language right and I'll pause here um if you have not thought about what spoken language looks like and if you know if you have not studied this um in in in your undergrad or your high school days spoken language is a sound wave right so Raw is basically a sound wave um and spoken language that's what it is it's sound wave the problem statement is you're giving this waveform and you are asked to generate text out of it right um that's what speech recognition is defined as many many applications of this field voice assistance that we talked about in Alexa and Siri are typical examples as we saw speech to text transcription right very popular in medical and legal domains where a doctor or lawyer is dictating um a case which is again a waveform recorded in audio and the objective is to run it through an AI algorithm which automatically generates text out of it uh this is also used in subtitles and captions for example you kind of run a speech to text engine on a movie and ask the subtitles to be created automatically that's speech recognition um ivr systems where you kind of call a customer support system and somebody prompts you to speak uh then that system on the other end tries to understand you and respond to what you are saying that's an example of how speech recognition is used very common in customer service bill payment so on and so forth right voice search um if you are using Alexa Siri to asking it to recommend restaurants near where you are that's an example of voice search right um accessibility it's very useful in for people with disabilities for example visually impaired people speech is a great help uh it's also used for in hands-free features where you're driving a car and you press a button and ask your car to do something that speech recognition behind the scenes uh language translation if you're trying to uh enable conversation I think this question this example came up yesterday somebody mentioned United Nations uh yeah that's where this comes in um if you're doing language translation of basically that is text it is NLP but remember since it's spoken first you need to translate the spoken Sound audio waveform into text uh that is and then you that is a prerequisite and then of course you can use your NLP techniques to do language translation right this is another category of use case this is called speaker recognition right so here your objective is not to understand what is the content of the audio form but you're trying to identify who is speaking okay so it's almost like matching The Voice signature kind of like matching the thumb print of different people right so the problem statement is you have been given spoken language again audio waveform and you are supposed to identify and verify who this individual is Right several applications phone banking is one that uses it though you might not immediately think that that's what's happening especially if you're doing transactions using phone banking using spoken language um if there are high value transaction banks have now started implementing menting um phone bins or your speech speaker recognition so that if you are the person who is um in charge of you are the person who's holding the account then they kind of run an algorithm to see if that it is you who speaking and doing the transaction and not somebody who's who has uh who has imitated your voice or trying to push a transaction through truth from your account right Voice assistance again they sometimes use speaker recognition otherwise you have the danger of your home being controlled by strangers who may be able to hack into the system right law enforcement uses this uses it pretty regularly um you are trying to you have speech recordings using phone tapping systems and you're trying to identify who is speaking is it the person that that you're interested in recording is it that person so can you identify who's speaking in a t phone tab right home security kind of like voice assistant ux customization right this is this is this is an interesting one right um in applications like smart TV um depending on who is giving the commands can you personalize the user experience by looking at it who in the family is talking to the smart TV can you customize the user experience so that's an interesting one vcle control um again uh if if you don't want the car to be controlled um by people who don't own the car so you can do speaker recognition in the case of vcle control right um here's another set of examples this is called sound okay uh good question so there's a comment from Manoj uh cannot deep fake clone our voice and fool the vi verification models good question man yeah so let me just set the context there has been this there there have been lots of cases in the last three months where deep fake which we talked about in the first class the ability to create create fake images and videos featuring other people who may not have actually been part of the image or the video is called Deep fake but deep fake has also now hit audio uh and it is now absolutely possible to clone the voice of somebody so if you have somebody and and what I heard was it is less than a minute of Voice or something so if I have somebody's less than a minute voice sample then it is possible to create a deep fake um of of the that person and that's absolutely true and and and that is a challenge for speaker recognition systems um and as we talked about right A lot of these things with generative AI where images can be generated where videos can be generated where text can be generated where audio can be generated are going to eventually make AI almost like cyber security in the sense of uh the there are crypto photographers and there are Crypt analist people who are trying to create something and people who are trying to break something so I expect this field to continue this like this for for a long time right where the the generative AI will get better in creating more realistic imitations and images and video and audio and then on the other side somebody will keep coming up with even better ways to detect and recognize it so I see this as a cat and mouse game which will continue for some time right okay um here's another example this is called sound analysis right now um note that we have explicitly calling out this is non-speech uh the problem statement is you're given an audio file and your objective is to identify and classify sounds from the recording uh into different kinds of sounds that are uh present in that audio recording right um it's common to break this down into bioponic sound which are sounds of animals and BioLife forms geonic which has environmental factors like rain and waves and trees and leaves rustling and anthropic which is human created right uh so these are different kinds of uh sounds that are present in an environmental recording and different applications one examp one interesting application is uh if you have a large Factory right you can use sound analysis to see if a machine is malfunctioning right because the sound that it produces especially if it's a mechanical defect if it's a mechanical defect or the equipment is failing then the sound wave signature will change and you can use that to say something weird is happening in the factory here healthc care monitoring uh is is interesting now there are devices which the which you can wear uh and who transmit the signal back to the hospital and of course there is no human sitting there and listening to these things so uh for example if you can detect a irregularity in a heartbeat using a device which the patient is wearing remotely uh you can use that for saying something is wrong here and then draw the attention of a human to attend to it home security that's an interesting use case it is used for example if the home is lock and you detect sounds like glass breaking or some alarm going off in an nearby house you can flag using flag that using audio processing right vcal Health monitoring right so sensors or microphones in your cars basically hearing how the engine is going around um the sound that the engine is making right or the vibrations that your chassis is making and if there's abnormal behavior there if there is sound which is not normal or not expected can be used to flag that something is wrong uh in the car right uh Disaster Response right um probably if if there's an earthquake or a collapse of a building uh you kind of have a microphone using a long wire going in and then saying looking for human sounds and saying there's a human bar and identify survivors using sound recognition Okay uh question from parak can geonic pattern analysis help to uh early detect earthquake that's a great question I don't know the answer to that you might want to look it up that is it possible to use geonic patterns to um early detect earthquakes but I think day before yesterday BBC covered something saying that um something in the waves right are about a 15 days early indicator of oncoming earthquakes I don't know whether it was related to sound or not right Wildlife monitoring uh again animals make sound elephants lions tigers so microphones which are much cheaper than cameras uh can be placed at different points uh inside a wildlife area in a sanctuary or in a forest to track how animals are moving across and life people who study life science uh use this to identify different species to identify biodiversity right um can we use this for security uh in tracking drones um yeah drones do tend to make noise because of the way they move they use a mechanical component to achieve flight but your microphones have to be powerful enough uh there are something called directional microphones uh which uh which try to which kind of are a substitute for radar in some sense so you can try those things directional microphone for tracking drones in the dark might be an interesting use case you might want to look into I'm not aware of whether it has been done or not right urban sound monitoring um tracking noise pollution and the environmental impact of traffic uh is another interesting use case U it's also used underwater for tracking marine biology how is how are the uh life forms which are living inside or below the ocean moving right so sonars are is a way to track the movement of these things and sonar is a kind of a uh sound and you can use sound analytics there right so um those are some examples um yeah so this is kind of trying to catch up with where we we were not able to finish this yesterday and I wanted to kind of complete this lecture two right um this is what we did in lecture two the first thing that we did was we set the vocabulary we kind of said artificial intelligence is a kind of problem statement or objective to replicate human capabilities like Reading Writing vision and speech and um that's why we kind of look that unstructured data fits very naturally into the AI world because it deals with text it deals with videos it deals with audio and so on so forth right machine learning is a set of tools right mathematical computational tools which can work on different kinds of data right and can be used for many many purposes deep learning is a subset of machine learning which uses something called neural networks deep neural networks inspired by the human brain and Anatomy um and and um often required very large computation power and data sets and finally data science is the application of machine learning typically to structured data and uh includes more than just machine learning you can use visualization exploratory data analysis that will also fall under the domain of data science we looked at different data formats and uh reemphasized how we are going to use a working vocabulary of a I versus ml versus data science right we looked at NLP as text analytics as a form of AI we looked at computer vision as a form of AI and we looked at audio analytics as another branch of AI right uh that's where I will stop uh I will stop sharing the screen for a minute as I pull up the slides for lecture three if you have questions I am happy to answer them as I am bringing up the next set of slides for lecture three okay here we go okay so um lecture three is about still looking at AI but we're going to change our length lecture one was history lecture was Data lecture three is going to be computational and we are going to get slightly more technical so this is going to get slightly more technical slightly more mathematical but I will try to keep the mathematics to a minimum since this is a first course or an introductory course um but hopefully you will get an uh a bird's eye view of how this field looks at from a computational perspective okay so the way to think about this is as I said the the the hero of our story today is computation right um and the way to think about this is that one way of saying is one way of arguing um arguing is that well doesn't matter whether you have structured data or text or images or videos or audio well at the end of the day it's all bits right uh so no matter what form you have at the end of the day when when it when the rubber meets the road or when data meets the metal it's all bits because that's all computers understand right our computers today are bin unless we kind of move to Quantum Computing all our computers today are binary what that means is all they understand is zeros and ones right and there is a whole field of binary mathematics right so if you studied um computer architecture or computer science in your undergrad uh typically the first year course the mathematical course there will be a course on Boolean logic and binary mathematics so that's where this field starts right um and binary mathematics is basic basically just to set the context what we mean by binary mathematics the numbers that we that we uh use in our day-to-day life are decimal number systems because there are 10 digits starting from zero and going up to nine and of course you can do mathematics on this what that means is you can add numbers subtract numbers divide numbers multiply numbers the same set of operations can also be done on binary mathematics where there are only two digits right zero and one there is no two there is no three there's no four and and I want into how we do this but for example a number like five in decimal can be represented as one1 in binary right so numbers can be decimal numbers can be represented in binary form um you can also do arithmetic operations in binary form for example you can add two binary num 0 1 1 and 1 0 0 and of course you can do logical operations in binary uh so one adding of one is one if you didn't understand this that's fine um if you don't come from a mathematics background that's also fine the key point to understand it is that this view of the world the view of the world that we are looking at today says well all data is fundamentally binary data so I don't really want to have a data view of AI I want to have a computational view of AI can you give me a framework to understand how this whole field of AI works right and that's what we are doing today so um this is probably more relevant uh what we will be using a lot today right um that the key idea that even though we talked about structur data semi-structured data and unstructured data in the form of text images audio video what we're going to be spending the next 45 minutes on is this single statement that all data can eventually be represented as a relational data in a form of a spreadsheet in a form of a table or in the form of a matrix right uh and that's the trick right that all data even though initially it may look unstructured in the form of text images video audio can be represented as a spreadsheet now this is a very very rich field and there are hundreds if not thousands of phds being issued in the field of computer science for this one statement saying how do you translate unstructured data to in the form of a relational data right it's a very rich field and continues to be a very rich field right now this may seem very difficult or counterintuitive and as I said we'll spend the first 45 minutes just talking about this right so if you start with spread sheets Google Sheets Microsoft Excel sheets or tables sitting in relational databases or enter data Enterprise data warehouse they of course Very naturally fit into a relational format and just to emphasize when I say relational data you should immedi immediately imagine a spreadsheet right I'm assuming everybody has used a spreadsheet like Google spreadsheets or Microsoft Excel so when I say relational data or Matrix I'm talking about a spreadsheet if that helps you right so spreadsheets and relational tables very naturally fit into this relational format what we are going to talk about and show is that text data can be engineered to fit into a relational form right so text data for example a newspaper article or a website does not naturally fit into a relational format you have to do some in you have to apply some intelligence you have to do some engineering you have to do some computation to make it fit into a relational format and we'll see how how we do that similarly image data can be engineered to fit into a relational format and audio data can also be engineered into a relational format and this whole field uh is called feature engineering right so whether we call it a relational data format we can also call it a matrix we can also call it a data frame right so what I'm going to do in this slide is introduce the mathematics Foundation of data in the field of a ml right and this weird weird looking equation I will read it out once and by the end of the slide I will make sure that you understand what this math is saying right so for those of you who are familiar with the math this notation is saying that a data frame X belongs to the set of real numbers R of Dimension n cross p right if you didn't understand it don't worry you will understand it by the time we finish this slide okay and if you don't understand it uh ask me right so let's look at the format that we are interested in right this format the thing that you are seeing on the screen right now uh is what a spreadsheet looks like or a Google sheet or an Excel sheet looks like this is also what a matrix looks like in mathematics this is also what computer scientists call a data frame right or database Engineers call this a table format all of them are actually basically talking about this data format in front of your uh eyes right now on the screen this table like Excel like spreadsheet like format right so what this is is uh let me explain what's going on uh this R this capital r that you see here is the set of real numbers right uh most numbers that you can imagine unless you have studied formally mathematics in at higher levels all numbers that you can imagine are real numbers right so -.6 um+ 4,200 million whatever you all of them are real numbers so the first thing that is what this math is saying here is that each row here I will identify using a bold X and A subscript I so the first row I will call X1 the second row I will call X2 the third row I will call X3 and the fourth row I'll call X4 and so on and so forth right in general the ith row will be called XI so if you look at one particular ith row let's look at the one which is highlighted here in purple or blue right it's saying the ith row which is Xi is composed of X i1 which is the which is the number which goes here x I2 which is the number that goes here x I3 and all the way to x i p xip p where p is the number of columns that you have so in this table you have five columns so this XI will look like x i1 x I2 X I3 xi4 xi5 so it's saying XI is equal to a set of X i1 till xi5 right another way of saying this is that this five dimensional vector or in general this P dimensional Vector belongs to this weird looking e means belongs to a set of a five dimensional real numbered vector this R is real numbered and this p and this P are the same thing so what we are saying is that we are going to going forward work with vectors of Dimension P for this data set p is equal to 5 in this particular picture there are five columns there are four rows and five columns and we are working where vectors are five dimensional which means each row has five elements or each row has five cells so in this case XI will belong to R five right and we will assume that there are n rows here n is equal to four 1 2 3 4 there are four rows and this data frame or Matrix or spreadsheet is going to be represented by capital x and when we say we have a data frame or a matrix or a spreadsheet or a relational data table capital x is our general way of refering to it what we are saying is that this Matrix has n rows here n is equal to four and five columns P columns in general X belongs to R and cross p and we are saying every cell here can only contain numbers real valued numbers so let me pause here and um see if there are any questions what we are saying is that going forward we will work with X belongs to RN cross P which is our relational data model it is referred to by many many names um we can call it a spreadsheet a Google sheet a matrix a data frame a data table in a relational data store and so on and so forth everybody comfortable with this okay X is the name of the data frame so if you have a spreadsheet that you are working with I'm going to call that spread sheeet X if you have a table that you are working with then I'm going to call that table X right uh n is the number of rows 1 2 3 4 four this particular table has four rows so for this particular table n is equal to four p is the number of columns 1 2 3 4 5 so this particular data table has p is equal to 5 right so that is n what is p and Gan basically what we are saying is X belongs to RN cross P Mohamad is asking what is the need to understand this you will see you will see what is the need to understand this in the next five slides okay this is the beginning of machine learning we are introducing the fundamental mathematics uh R is 01n well it 01n are integers uh in your example it is bigger than 01n for example it could be 0.3 or it could be8 or it could be 2x3 so R is a set of all real numbers okay all right so let's move on hopefully that answered all the questions how can a cell identifier is negative sorry I did not understand that parag um the value contained in the cell can be negative that's not a problem right for example let's suppose one of the columns captures temperature temperature can be minus 30° Centigrade so it's absolutely fine to have negative data in the real world or your bank balance we identifying the vectors as like n into P like n is the number of rows and P is the number of columns how can that be negative understand that the identity inside this cells could be negative but n NP how can that be negative I I did not say n and p can be negative I said that cell values can be negative cell values okay R is Val yeah I never said n and p can be negative thank you yeah n and p are positive integers as Richard pointed out okay all right so there's an example that gosn that might be helpful for people looking to learn more but this is hopefully the basic introduction so a lot of these things are going to be uh there are many many names people from a mathematics background refer to this as a matrix people from a programming background call this a data frame people from a database background call this a data table people who have worked in offices and Enterprise settings call these things spreadsheets but all of them fundamentally are the same thing and from the strict mathematical definition we say we are working with data X belongs to RN cross P okay the interesting thing as we pointed about and that's what we're going to talk about for the next few minutes is that no matter what data you have whether it's text video image audio we are going to engineer it we're going to process it we're going to compute on it to make it fit this data model which is X belongs to our ospy right uh first we're going to see how to do it and then you're going to see why this becomes a very powerful idea when you bring in different formats into the same data model here is the hint right here the hint is that most machine learning and hopefully Muhammad this answers your question most machine learning and AI models take this is the data format that they expect as input right so uh if you can engineer your data to fit this model then you can then the the algorithms the machine learning AI models uh suddenly become open to you for doing everything that we talked about yesterday and today right all those things become possible but uh the we have to work to make that data first fit in this model and then the prepackaged algorithms assume that this is the data model and you put them together and voila you have an implementation of ml or AI right hopefully that kind of gave you of why we are studying this okay all right so the field what is e where do you see e oh that weird looking e okay so e is as I said a short mathematical way of referring to mathematical way of referring to the word belongs to so we say x belongs to RN cross P so this is a space in which x Falls another way of saying that is that X is a matrix of Dimensions n rows and P columns where every cell is a real number right R is the set of real numbers so sasna is asking in ask being a real number I think sasna you are confusing real numbers with integers real numbers subsume all rational numbers and irrational numbers okay so the the field of study which which transforms text images and videos into this x belongs to R and pro relational data model model is called feature engineering and as I said very rich space in machine learning and data science for multiple reasons this is where domain experts bring in their domain knowledge So when you say how do you feed human knowledge into machine learning and AI systems this is the answer you do it in the feature Engineering Process so that is one way of thinking about it domain knowledge is is broughten into AI ml using feature engineering and more recently in the last decade we have seen innovation in deep learning where you actually um use not domain knowledge but the way you set up the architecture of your AI model uh you use it by studying the human biology or the human brain structure and let the let the machine learning model itself do the feature engineering so automation of feature engineering both are still valid and both are worth studying so let's jump in right let's start with text right how do you convert text into into a relational data model or fit into a spreadsheet right um here is the key idea right the basic idea where we start that if you look at a text document you can actually describe it in terms of something called bag of words okay you'll see s you'll see what I mean by bringing domain knowledge you're actually seeing it you'll see it in this screen itself so uh Stay With Me right so here is a English sentence um let's suppose the English sentence is Al although about the bird the bird bird bird bird here's another English sentence you heard about the bird the third English sentence is the bird is the word right so what you are doing is you are saying well I'm going to create a table structure I'm going to create a spreadsheet structure I'm going to create a matrix structure and going forward I'm going to use the word Matrix I like the word Matrix but uh if I say Matrix you can as well here it spreadsheet right so I'm going to create a matrix structure where each sentence or each document can be will be a row and all possible words in the English language will be the columns right so every word in the English language will become a unique column okay and what I'm going to do is I'm going to look at this document or sentence and basically say how many times did the word bir occur in this document or sentence and here it occurred five times so I'm going to put this number here as five how many times did the word about occur in this sentence it's going to appear at once so I put one how many times did the word heard occur in this sentence it did not occur so I'm going to put zero and so on and so forth for all the words of course here I'm showing only seven words but you can imagine this the number of columns being actually 30,000 or 100,000 30,000 is the typical vocabulary of a human and and most words can be capture in 100,000 domains so if you have 100,000 columns then any document can basically be captured like this right so this is a relational data format this is a matrix and this approach of representing text as a bag of words where each column captures the frequency of the word is called a bag of words approach to translate text Data into a matrix or a relational data format right uh note that we completely ignore the sequence in which the words are occurring and we ignore any notion of grammar uh it's just we are treating a text document as a bag of words and hence the name let me pause here and see if there are any questions this is one those fundamental ideas which after which you can build more and more complexity on top of okay if there are no questions then let's move on okay one second now we are heading to Identity element okay so first question from Rahul is how is this helpful to the model to understand the sentence coming um that question will come towards the second part of the class when we will see how machine learning algorithms use this relational data models for for the time being just stick with me that our objective is to kind of convert uh text into a relational data model because because most AIML algorithms use uh this data model by default okay so there's lots of questions sorry one second okay something wrong with my machine here let's take a minute there are a few questions coming which I want to answer U question from G not a question but I feel like we are heading to the identity element I'm not sure what that means I'll skip it I assume that the transfer model that chat GPT uses leverages this not this but the last bullet point on this slide is U is something that chat GPT uses so I will talk about this in this slide towards the end gladson is saying does it mean that the table is already filled with the English words and then it takes the count absolutely correct gladson the columns are predetermined if you're working with English then every single word in the English language becomes a column and you fill it up based on the text document that you have uh how does the different different between different languages in this approach um if if you you so first you need to know what language it is and then uh depending on the language that you are working with the columns will be different um you can say if you're working with English there will be 100,000 if you're working with French then all the French words will appear appear in the column if you're work working with Tamil it be all Tamil words and so on and so forth um uh what will be feature inuring looks like if you have data already in the table form then you don't need to do well let me rephrase you can still do feature engineering you don't necessarily need to do it but one example of what what you're saying can be then you can create a new column which is a product of two columns Manoj and that will fall under feature engineering okay um Guan is saying in the N cross p is this four cross 8 or 3 cross I we ignore the first columns and the first rows and we focus so y can we ignore the first column and the first row um it's 3x7 what about dot or comma uh utal if you want to so as I said this particular approach of bag of words ignores uh grammar so dot and comma go away but of course there is nothing stopping you from treating dot or comma also as a bag of words though that is typically not done there are more advanced things that will take into it or is absolutely a valid English word so there is no issue in putting or uh as a column in the data set okay so let's move on all right U the second approach is an extension of the bag of words approach and it's called tfidf which is a short form for term frequency inverse document doent frequency which is very similar to bag of words but adds a Nuance saying that um we we need to kind of normalize so if I have a collection of documents let's suppose I have all the novels that were written in the 20th century right uh I will still uh use the exactly bag of words approach but I will give different weightage U to to the importance of the frequency of the word depending on how commonly is that word used in the Corpus so if you for example if I'm analyzing analyzing 20th century uh novels then certain words may be more common and other words may be more um rare so can I add a waiting factor to um to different words based on how frequently they occur in my Corpus a corpus is basically a collection of documents right this is a next attempt where one of the one of the criticisms of back of words is that you are treating each word independently and you are completely ignoring the sequence in which words occur right so you can say Okay instead of treating each word separately I'm going to treat every two sequence of words tfidf stands for term frequency inverse document frequency par so in engrams you basically instead of working with words you work with pairs of words and in pairs in the sense sequence so about bird is one Bagram by is two about bird is two Bagram about bird is a Bagram bird about is another Bagram so you can work with the columns instead of being unique words can be pair or sequence of words right so what you're trying to do here you're trying to get some notion of the sequence captured in your representation of documents you're are basically saying how often did this sequence of words how an N can be by try for grams how frequently did the sequence of words appear in the document so so instead of describing a word instead of describing a text document as a bag of words you are defining a text documents as a bag of engrams right um then there is p is parts of speech tags right so basically what you do in this is you represent each word first you first you run an engine to say whether this word is a noun or a verb or adjective right and you replace the whole document with uh replace each word with a noun tag or a verb tag or adjective tag so what you're trying to do here is get a handle on the grammar part that this is a sentence where a noun follows a verb and then an adverb and an adjective right so you kind of this is another set of documents instead of 100,000 columns now maybe you have 120,000 columns 100,000 represent the word and another 20,000 represent the different grammatical Concepts and that is also being captured about the document right uh then there is named entity recognition right so let's suppose you have the name of all the companies in the world all cities in the world right um You have the name of all the um human names in the world and you you are capturing them and then also capturing whether this document contains these entity names right um that is named entity recognition right then you have sentiment scores so you have this lexicon where you have said this sentiment for example great is a positive sentiment horrible is a negative sentiment so you have a dictionary where and or a lexicon where you have said this word is a positive sentiment this word is a negative sentiment and by looking up that dictionary you can assign a positive or A negative score to a document right uh you can put a readability score depending on how complex or how hard or how rare are the words that are used in this document and you you can assign a readability score to a document right the point I'm making is that you can describe a doc document in a relational data format by capturing many many different aspects which come from the domain what is the domain that we are talking about here domain of course is language somebody who understands language uh English language well or the language that we are working with well can create different kinds of features which are shown in green here which can tell us what the document is about right words uh document can be described in terms of words it can be described in terms of what entities it is talking about what organizations what cities it can be described on terms of sentiment it can be described in terms of readability right and it can be described and this is The Cutting Edge The Cutting Edge is something called word embeddings which is what GPT models use in GP for right word embeddings is a very interesting idea and the idea of word embeddings is that you basically um and the best way to understand it is that you create a new space which captures meaning right such that two words which fall in the space are closer to each other so you can represent words uh in in some spage in the semantic space semantic is a meaning right and and a lot of you will not follow what I'm saying but rest assured when we go future into the classes on text processing in the Deep learning class you will understand what I'm talking about but word embeddings are instead of representing words instead of representing document in terms of words you represent document in terms of what those words mean so the meaning of The Words which can be described not as language but as vectors right and that's where the word embeddings come up which is as I said said current state of the art uh there's a question from parak can you repeat describing engram I can but I seriously request you to pay attention why I'm speaking because every time I repeat something uh I'm happy to do it but then we lose out and the class does not finish on time and that leads to a ripple effect okay so here is an engram and engram if you look at the word you heard about the bird you can think that you heard about the bird is a sentence consisting of five words which is absolutely fine but you can also say that the words you the word you followed by the word heard what will you call this sequence of you two words right so the sequence of two words in linguistics or in the study of language is called a biogram a biogram is a sequence of two words right similarly heard about is also a sequence of two words it's also a Bagram about the is another Bagram the bird is another Bagram her bird is another Bagram it does not appear in this sentence but instead of looking at one word at a time you are looking at two words at a time that's a biogram you can look at three words at a time and that's a trigram so instead of representing a document in terms of words so you can also represent it in terms of engrams right so you're absolutely right sasna you can say that engrams is trying to capture the context because it's capturing the sequence in words in which words occur also right uh let me take a pause here and see if there are any questions on different ways of representing or engineering or Computing text which is unstructured into a structured data format which is what we are seeing on the screen here okay now let's move on to images right um images the idea is that each image can also be described in terms of a relational data model or a matrix right and of course the most obvious way to do this is to think an image in terms of pixels right and that is raw pixels so if you look at the picture on the left hand side here which is the number eight uh how is word embeddings represented into relational structure okay um unfortunately even if I explain it you won't get it because it's the first Clash so let me kind of say it this way think about it like this that instead of the columns representing words The Columns in this case are uh the columns in this case are mean meanings of words right dense meanings of words why that is important because there are synonyms right for example if this if you let's take an example if this column was King and this column was Queen in the bag of words representation there will be two separate columns right but in the case of word embeddings uh there will be a one column and there will be a different vitage given for King and different Ved given for Queen so think of word embeddings as creating a vector representation of every word such that vectors are continuous and Words which mean the same thing have similar Vector representations uh gladson says Where can we read more details about text features well okay so I'm I'm actually so so in today's day and age the answer is chat GPT or the answer is Google right um so there is search available and there is chat GP available uh you should go and just Google there how text uh how is text translated into features and you will get enough information to go and read so uh yeah I'll stop there right Google and chat GPT um are your friends absolutely Richard go and read there right there's enough content out there uh and and if I point you you you should be able to kind of follow it up on your friends um okay so uh let's go with image features um image features uh if you look at an image an image is basically pixels right so if you look at this image here this image is of the number eight right now uh what the and what do you think if you think about how computers represent the black color the black color is represented as zero and the white color is represented as 255 on a gray scale right so you can take this image and for each pixel here um you can let me just zoom in in right because this might be hard to read give me a minute let me see if I can zoom in so that you can see what's going on here right so this is if you just look at this you'll understand what I mean when I say that you can represent an image at as a as a spreadsheet right this is nothing but a spreadsheet the image on the right is nothing but a spreadsheet where each cell has a value in a real numbered value right so this is your first row this is your second row this is your third row and and there are numbers here right and the the more white something is the higher the number the more black something is it's closer to zero so so you can kind of see how images can be thought of as fitting a relational data model right everybody comfortable with this okay let's zo out um so raw pixel representation images can be represented as that now this is where again look at how domain knowledge comes in right so domain knowledge says well I can also represent a picture in terms of the if it's a colored picture not a gray scale I can create a histogram of all the different colors there are and different images have different different colors so I can create a histogram so if I if I imagine my relational data model or a spreadsheet each each row now becomes an image each column becomes a color name or a color ID and I put in that cell um the the frequency with which that this color appears in this image and here is another way of representing images in a relational data format right so the notion of color from a domain knowledge comes in and you say I can represent an image in a color histogram right you can also represent it in histogram of gradients for those of you who have worked with images either because you're passionate about painting or image processing or um photography the idea of gradients is how quickly do colors and brightness change right so you can create a mathematical function which dis or a histogram hog starts from histogram of gradients and say I'm going to Define this image in terms of histogram of gradients right some some pictures will have sudden transitions this is very common if you for example take of uh a photograph or drone photograph of a road in urban setting uh there'll be very sharp gradients whereas if you take a nature photograph of a scenery with mountains and trees and the sky and the river the gradients will be much more gradual so you can start to use that to kind of Define uh how to describe a picture in terms of histogram of gradients then there's something called scale invariant features um I'm not going to too much into this but these are very interesting features which are robust to scale which means and this the the the achievement is more important to understand um what it does is that if you take a photograph and represent it as a uh let's say a photograph um of three by 3 in by 5 in right and then you blow up that photograph uh into a 30 in X 50in photograph right uh does your pixel representation change and the answer is absolutely Ely because your Matrix got bigger does your color histogram change maybe maybe not does your histogram of gradients change probably not probably yes what happens if I take that 30 by 15 photograph and rotate it by 30° then your then your histogram of gradients will change so sift are a specific kind of features which were designed and again something that earned multiple phds the sift Innovation multiple phds came out of sift uh but the idea was design features from images which do not depend upon the size of the image or whether the image is rotated or or whether the image is taken in a dark setting or a light setting right so even if I change the intensity the color the features don't change and these are called scale invariant features right um then there's something called bag of visual wordss uh think of this as describing an image in terms of visual Words which is a good proxy first proxy the objects that the image contains or the segments that the image contains right um and then so okay so till now the features that you see were all handcrafted what do I mean by that there were experts researchers PhD students in computer science departments trying out different kind of features that they can derive from an image trying whether it helps computer vision problems publish ing papers then other researchers were adopting them and if they were doing well they became Library they became available as a library in python or different programming languages and that's how they get adopted what is more common in the last 10 years is to move away from this model of handcrafted human design features to let machines and algorithms figure out the features also themselves right and this is where deep learning things like convolution neural networks which is a form of deep learning um come in and you basically feed the raw pixel image to a deep Learning Network or algorithm and it automatically derives the features that it needs right and in fact these things are called feature pyramids because they Define many many many features at different scales and that kind of makes these algorithms very powerful in computer vision and deep learning today is the stateof the art in computer vision so you can see how the field has changed um uh no that's just a Hollywood idea cotlin um yeah I mean I'm sorry if I confused you the movie from the Matrix that was just U I guess it was not a it was not computationally relevant it was just a something to trigger interest in the audience okay so here is audio features right we talked about how image can be represented as as a relational data model we looked at computer vision let's also look at audio right how do you represent audio and the high level image is on the right hand side so let's look at some audio features the first one is called mfcc I forget what it stands for but basically the idea is a soundwave is basically um a combination of multiple frequencies of waves right and I apologize if you have not studied physics in high school or undergrad but waveforms is what constitute audio and if you don't understand that's fine as long as you understand that audio can be represented in a relational data model that should be sufficient right so one way of representing uh audio is power Spectrum where basically you say what are the different frequencies contained in this audio signal uh and you select the frequencies which are relevant for human hearing which is about 20 HZ to 20,000 Hertz and you say can you give me which frequence the set of frequencies contained in this audio signal right and that is called the power spectrum of a signal uh so if you have have audio signal as a row uh and all the possible frequencies as column then then audio signal can be represented as a combination of all the frequencies yeah Richard has pointed out the full form for mfcc Mel frequency seal coefficients absolutely correct um then the other way to look at is something called chroma right where basically you're looking at energy distribution uh how much energy is contained in this audio signal so if there are different pitches right how much energy is in this pitch how much energy is in this pitch and this is fairly useful for music analysis right because music has audios from different P at different pitches right zero Crossing rate uh how frequently does your signal cross uh the midpoint the zero point uh this is very interesting to kind of characterize the per percussiveness right so if you're playing drums or banjo or tbla or these are percussive instruments and the more the higher the beat at which you play the higher the zero Crossing rate will be of the audio signal so you kind of capture that in the audio signal RMS captures the power in the signal and is a characterization of how loud the audio signal is right so that is RMS capturing um formats are basically resonant frequencies in the vocal tract and and they are important for speech analysis speaker recognition gender classification it turns out that the the way human the male and females can be identified based on the frequencies they use more and that's how we are able to tell whether a person speaking on the phone on the other side the male or female right there something called fundamental frequency which is basically the rate at which your vocal cords vibrate when you speak and it turns out that this can be determined from the audio signal that has been recorded and finally the most complex ones the state-ofthe-art ones today are called time frequency representations uh which and if you pay attention to the signals above uh some of them are purely frequency based uh like mfcc some of them are temporal like zero Crossing rate right and the time frequency representation combine both the time and the fre frequency representations and if you're interested more in this you can look at how forer transforms and wavelets work though that is a very electrical engineering kind of a subject and but by all means if you're interested you can look at it right so that's that's audio features right the long and short uh how sound waves can be represented in a spreadsheet as I just described right let me repeat prati so for example power Spectrum so let's suppose you have an audio recording of three minutes and you have 100 recordings of 3 minutes each so each row in your spreadsheet becomes an identifier for this audio signal let's say A1 to a00 so you have 100 rows uh each row representing the 100 audio signals or 100 audio files that you have then in the columns you put all the frequencies between two Herz and 20,000 Herz and you characterize each audio by uh the whether the frequency is present in this audio signal or not so now you have represented an audio single signal in terms of its frequency components right so that is the first bullet mfcc right does that make sense PR did you get that basic ID got it and similarly you can keep adding different features you can add choma you can add zero Crossing rate and increase the number of columns and you can describe that audio signal and you the more features you add the Richer information about the audio signal you're capturing in your spreadsheet does the sampling rate influen the number of rows I did not understand this quote can you repeat it or unmute yourself and ask the question uh yes can you hear me yes I can okay thanks professor yeah I was just wondering if um in your example is each bro an individual audio file or is it a sample that's taken uh so many times per second 100 times per second times second no no no no it's individual audio file for example if I have a if I have a music library of a million songs each song will become a row so I have a million R okay thank you sure okay so this is our view of the world everything that we talk about in the rest of the class will be based on this view of the world the relational data model view whether you call it why because now we have said and this might be this is both interesting and and and kind of saying why did you take why did you waste the last class saying that there are different kinds of data formats uh and then said everything is relational well that is because both statements are true there are different kinds of data formats the way you store them the way you process them the way way you engineer them the what you can do with them applications all of them are different but by the time you take the data format and bring it to the algorithm application part of it you have already translated it to a data frame or a matrix or a spreadsheet or a data database table so the reason we had the last class and the first half of the today's class was to kind of make you understand that there are different types of data leading to different kinds of a IML applications but all of them might must be engineered computed to bring it to this data model and from here on we're going to talk about this right um now some terminology some more terminology if you're going to read uh things on Google and chat GPT and books and papers um the the vertical things here right the vertical things here are called attribute so this is attribute one attribute two attribute three attribute 4 attribute five uh you will also hear the term feature so you can also call this feature one feature two feature three so on so forth you can hear the you heard the word column column one column two column 3 or you can call it Dimension Dimension one dimension two Dimension three and so on so forth I particularly will use the word column but authors in the field use use these words interchangeably attribute feature column Dimension all of them refer to these vertical things right similarly the horizontal things are also called tle element or row so I will use the word row so when I say row you should imagine the hor izontal purple when I say column you should imagine the greenish blue vertical view of the world right so that's the data model that we'll be using we have already introduced the basic mathematics X belongs to R and cross P basically means whatever the data table is I'm going to refer to it by using the alphabet X I'm going to assume everything is real numbered and I'm going to assume that there are n rows and P columns you can fill out nnp as per your choice um well so I will leave it to you what is lower what is higher Deepak but both are valid views of the world the computational view is what we are just starting the data view is what we have just finished right so we starting the competitional view uh I don't know which one to put on top of each other uh one more thing about the data before we move in if you hear the term Big Data people are typically talk talking about a data model like this where the value of n is very high right another way of saying this is n can be 100 million which means you are imagining a table or a matrix which is thin but tall a thin Matrix means the value of p is small it is tall means the value of n is large and you will typically find this in Enterprise data warehouses on the other hand if you hear the word high dimensional data what people are saying is the value of p is large right A P might so the example of text where P was 100,000 or if you're working with images P might be a few million right so this is think of this as wide data it's wide because the p is very large so the number of columns is very large so it's wide data and that is what we will call High dimensional data so if you hear these terms that's what people are talking about Big Data versus high dimensional data one refers to large n the other refers to large P of course you can have both a data set which is both big and high dimensional which is what the what is what something like GPT would use which is U lots of Word Documents on the web millions and millions of Word documents that is n is large and each one of them containing text image and video which means p is large okay so now we have the foundation and we're going to switch into the competitional view of the world uh it's a good time to take a break it's 8:03 uh again um we we will take a five minute break uh it's 8:03 and we will come back at 8:08 and then start this part of the world which is machine learning which is the foundation for AI deep learning and data science okay so we'll be back in 8 minutes all right we are back okay so um we Switching gears now we have kind of understood data models data formats how feature engineering can be done to convert uh different data formats into standard relational model called X belongs to RN cross p and we going to start from there right so let's jump in and the first thing that we are going to talk about is a field of machine learning called unsupervised learning right and unsupervised learning if you were to say it in a few words is fundamentally about finding patterns in data right and we'll see what that means right so here is the data model um that we have been uh we have decided that we'll work with um and this data model as we said will be relevant for all kinds of data formats that we have been studying eventually data will be bought brought into this format and and we're going to see okay what does that mean right so this is data this is the data model how do I do unsupervised learning and what does it mean to do unsupervised learning on top of such this data right so unsupervised learning let's first Define it it is the task of inferring a function which can describe patterns in the data right describe the hidden structure in the data right so you basically view the data as a collection of rows right so you can say this is Row one this is row two this is Row three this is row four and this is in general row N so each bold X refers to a particular Row in this data set right um You will also unfortunately the same terminology without clarifying if you're is often used to refer to the columns also so some authors will call the columns X so they will say this is X the First Column is X x one this is X2 this is X3 this is X4 this is X5 in general this is XP right and they will use the same alphabet and that tends to get very very confusing the first time if you're entering this field so whenever you read that my data is this form uh take a minute to kind of double check whether the author is talking about the rows or the columns because depending on the context sometimes they use the same alphabets and same terminology and so same notation sometimes to refer to rows and sometimes to refer to columns the trick is to look at the subscript right because we said there are n rows so if this last one is n that means they're talking about rows and if the last is uh subscript is p uh they're talking about columns because it's n cross P right so that's the way to kind of check it the more important thing and and if you are not if you have not studied machine learning before don't even worry about uh this bullet but if you have started data science or machine learning before then note that in this view of the world when we are talking about unsupplied learning there is no dependent variable right um so we'll talk more about dependent variables uh in the second part of the class right so the first form of unsupervised learning that we will look at is something called clustering right clustering is a sub problem in unsupervised learning about in which is a part of machine learning right and we'll see how it applies to both structured data and unstructured data and how it very quickly becomes quote unquote AI right so here is the data model I'll keep this data model on the top right uh of the um of the slide uh remember the each row can be uh can be a row about marketing data uh it could be a row about supply chain data a row could be a text document a row can be an image a row can be an audio file right whatever we have started in the last two classes till now we have said data will always look like this irrespective of whether we're dealing with text image audio video U or uh marketing data supply chain data this is what data will look like right so we are given this x belongs to RN cross P which is basically this view of the world in the top right now what is clustering clustering is saying what happens if I represent this data as a scatter plot right so let me spend a few minutes on this because this is again one of those fundamental ideas that um that if you understand then uh life will become very very simple going forward so the idea is that if we take each one of these columns right and make these columns and axes in a chart right so in this for example what I'm showing in this diagram is a data set where there are only two columns let's say column one and column two column one is annual income of an um person and column two is the value of the property that he or she owns so here is a data set with only two columns so p is equal to two right and I have data about 50 people so I have 50 pairs of numbers and each person each row represents a person right and I have their annual income as well as the value of the property that they own so n is equal to 50 and P is equal to 2 and I represent it here so each bubble here represents a person uh the value of their X X's value represents their anual income and their location on the y- AIS represents their property value right so this is how this data is plotted right so you are representing relational data which is basically this you representing it as a scatter plot where each column has become a dimension of a chart and that's why the column is also called a d that's why the columns are also called Dimensions they become a dimension um so in a two- dimensional data you would have two Dimensions if I was to add a third dimension then the third dimension will be added if you add a fourth dimension that the fourth dimension will be added you will not be able to visualize the data so we stick with two column for the time being but the math will work for any number of Dimensions so the idea is you represent the relational data as a scatter plot where each column becomes a dimension and each row in the data Falls in that chart and the position of that row depends on the values of the two columns I'll pause here and make sure if you have any questions I answer them because there's such a fundamental idea that it gets used uh very often in machine learning uh Professor this is Rahul here um this is clear but I just wanted my clarity on this attribute tiple and relationship for a moment do I understand correctly that attribute or feature or the column represents a b a way by which we can identify a tuple and the set of attributes together set of attributes together and basically the different means that you sh for example in case of uh you know images there is uh in case of text or in case uh you know you have got the engrs the P these are various types or categories of attributes that you can give to identify the Tuple or the r is that understand absolutely correct perfect that's what thank you sir sure okay in the scatter claws we choose two columns from P Dimension right uh you know uh that is one way of understanding it that we are right now represent we are choosing only two columns and you can choose many more columns you're absolutely right you the only reason we are choosing two columns is because I cannot draw three columns or four columns on on a slide and four columns are possibly even hard to imagine okay so this is the basic idea right so this is how we are representing data now what is clustering right clustering is I give this data set to an algorithm a machine learning algorithm and what does the machine learning algorithm return the machine learning algorithm basically represents groupings of these bubbles by or colorings of these bubbles such that uh rows which are similar are colored with the same color right so you will see here that the bubbles have turned green orange and purple and the clustering algorithm has basically said here is one cluster the orange cluster which are people with high income High property value um then there's the green cluster and then there's the purple cluster so the idea of clustering is to find elements or find rows or tles which are similar to each other right but are distinct from other so in a sense the orange bubbles are all similar to each other but the orange bubbles or the orange rose or the orange people are different from from the green people are different from the purple people now this seems a trivial thing to do and you're right it is absolutely trivial when working with two dimensional data imagine how will you do this if the number of dimensions in your data set was 20 right because you cannot visually plot a 20 dimensional plot right how will you do it if the number of columns was 2 million and the number of rows was 100 million so you have a spreadsheet of 100 million rows and 2 million columns and I'm asking you to find clusters of rows which are similar to each other find those people who are similar to each other right and that's where you start realizing that this cannot be done by hand you read an algorithmic way of doing it and that is what what this problem will fall under the domain of clustering where you are clustering rows by finding those rows which are similar to each other and putting them in the same cluster or same bucket or putting them assigning them the same color right so that is what clustering is right um another way of thinking about it is if you look at this space this gray space that I have grayed out you can also say that clustering is the process of identifying that part of this gray space where data lies because if you look at the there are spaces where there are no elements right and that is also useful sometimes you kind of look at a cluster and say well there are no data points in this part of the space and that itself may be interesting okay so this is the basic idea right now I will leave you with some thought experiments and I want you to think um you don't need to answer me and we won't even discuss it because examples are coming but every time I'll flash three things and I'll ask you to think about for 10 seconds right what happens when the rows are text documents and you run [Music] clustering right what happens if the roles are images and you run clustering what happens when the rows are audio signals and you run clustering right and I will leave you leave you with that and and the answers are coming but these are thought experiments it's good to think about these things uh rather than I simply give you the answer right um here is another interesting example of unsupervised learning uh this is called anomaly detection there is another kind of unsupervised learning problem right uh and in this Unis problem again the setup is very similar right you are given X belongs to RN crosp which is your given data in this format right and again you represent the relational data as scatterplot exactly as before right but this time what you are interested in are the elements which are not like any other elements which are unusual which are weird which are different from everybody else and you see these points here orange and blue and purple and greens right these are not like anything else in your data set these are not normal these are anomalous these are weird these are interesting you can say many things about them right so the objective of this kind of an algorithm is and you can actually use clustering to do a n detection right if you think that's what's happening here you have run a clustering algorithm you have identified clusters and then you are looking for uh points which do not really fall into any cluster and you say well these are anomalous points right um and why are they important well think of them as the data points left after clustering and I want you to think about what happens if the rows are text documents and you find a text document which is not like any other text document in your Corpus what are the applications of that right or you look at images as rows and say I I have a collection of billion images and I found these 300 images which are not like any other document any other images in my data set okay um so there's a question from parag about why uh is the blue DRFT left out um don't worry about that this is a sample toy example to kind of make sure that you understand the problem the blue point in reality you're right realistically it will not be left out I think it is a handrawn toy [Music] diagram okay what happens if the roles are audio signals and you identify audio signal which is not like any other audio signal in your Corpus right all thought experiments uh hopefully now you are a able to now the idea of this this exercise is to kind of start help you think about how we will use these machine learning algorithms or AI algorith thems to solve real world problems right what is interesting is that in the same slide I can say we are doing data science I can say we are doing ML and we are doing AI right depending on whether the data that was represented in the X belongs to RN cross P or text image audio or structure data from marketing or structure data from Finance data or data from supply chain uh we have abstracted all that out right and hopefully Mohamad the question that you asked uh we have kind kind of abstracted out the data format part and we have moved from the com data view of the world to computational view of the world by using feature engineering as a stop as an intermediary we have said no matter what your data format use feature engineering to bring it into this relational data format and from here on you we Define algorithms for ML and AI which can work on this relation data model and the applications depend on the data format that is coming in okay um question from part is there a minimum number of data point points which qualify as a cluster or is it dependent on the number of the roles in it um the answer to that part depends on the underlying algorithm that you are using remember anomaly detection is a subfield or an application of unsup wiseed learning which falls under surise learning there are dozens and dozens of anomaly detection algorithms out there right so and each one of them kind of sols some of them have a minimum number of data points some of them actually use make it a function of the number of rows in the data set so it's a fair question but the answer depends on the algorithm that you are using to do a n detection these are basically problem statements in machine learning to help you kind of understand where these things are used okay all right here is another kind of Unis learning this is called uh recommender systems right uh and recommender systems are basically very again very very rich field lots and lots of applications right so here the idea imagine for example you are an e-commerce platform right and each row represents a user or a customer and each column represents a product that uh you that that person has that you sell right so you sell six products and there are five users right so this is a five cross 6 x belongs to five cross 6 five rows and six columns and this user has rated this product five stars this product one star this product one star and this product two stars right now this is the data that has been given to you right so what is given to you now note a new word has popped up you are given a sparse X belongs to r n+p or sparse the word sparse is used in mathematics and computer science to mean that there are lots of empty cells right there are lots of empty cells in my Matrix so it's fastly filled out and that is very very common in recommendation systems because very often the number of products that you sell is in millions and millions and there is no user who would have rated every single one of those products so most of the cells are in fact empty or blank and that's why we see the data that's why we say that the data set for a recommended system is sparsely filled out so you're given a spse x belongs to our infp right and the problem statement is complete the Matrix which means fill out what numbers you expect to see in those blank cells right and I'll give you a moment to think about why that might be very interesting right the reason it is interesting is that if I for example know that user D is likely if I fill out this Matrix to and interpret it to mean that user D is likely to rate item one at three item two at two and item six at four then I immediately know that it might be a good idea to recommend item six to user d right because he's likely to rate it four which is another way of saying that user D is likely to like product six which is another way of saying that user U is likely to buy product six right so the idea of completing the Matrix right is another way of saying that I want to predict the missing values and you can do that intuitively and opening up the lid as to how these systems work underneath the hood You can predict these values based on user us similarity you can say user D is similar to user b or user D is similar to user a why would you say user is when are two users similar well if they rate s same products if they G the same rating then we can say users are similar to each other right so based on user similarity on how they rated past products if users are similar you can use this to predict the missing value of users right you can also use this there's another way to do this um you can use predict the missing values based on item item similarity right or you can fill out these missing values using a using a trick in linear algebra called Matrix completion right so that's a mathematical idea we not going into the algorithmic part but you can predict these missing values in many many ways each approach here is a different recommendation system for example the first approach is something called user based collaborative filtering you BCF the second approach is called item based collaborative filtering and it is called ibcf the third one is called non- negative Matrix factorization and that is Matrix completion right so there are many ways to solve this problem we are not going to go to the solutions just to understand what problems there are because the art today right and as at the level at the senior level at which you are in your organization your job will be to look at a real world problem and say I can use recommender systems to solve this problem or I can use anomaly detection to solve these problems right and algorithms are more at implementation layer uh which you can get hand which you will be exposed to on Hands-On but I think the the more important skill today for leaders is to kind of say what kind of problems can be solved by AI ML and that's what we're trying to do right so the idea why is this powerful because if you know that the predicted value is high then you can recommend items to users you can also recommend the users to Target when launching a new product so you can recommend items to users you can also recommend users for items right and that's that's where that's why this is a very rich field now some thought experiments I mentioned that the rows are product uh so the numbers in these sales represent product reviews where rows are users and columns are products and that is one we already talked about another way of applying this is is the rows are still users but the columns are YouTube videos right and the rating in the cell captures whether you viewed 100% of the video or 10% of the video or 50% of the video so think about what you get here right rows are us users of a video streaming platform columns are all the videos and you are implicitly deriving the preference of a user for a video by looking at how how long the video was viewed for or you can say I will capture how frequently it was viewed maybe it was view multiple times right and what happens if you build or use a recommendation uh algorithm on this data set well you get a recommendation engine for YouTube and you can do things like autoplay right or for example Instagram right so each row is still a user each column is an Instagram post right and and if U you you put put a value of one here if the user gave it a like you put a value of two here if the user forwarded it uh or you put a value of three here if he bookmarked it or she bookmarked it and so on so forth and then again you have this incomplete Matrix and this can be used to design Instagram feeds for example what to show the user next right okay I'll take a pause here and see if you have any questions on the three things that we talked about in unsupervised learning till now clustering anomaly detection and recommender systems all fall under the domain of machine learning inside machine learning they fall under unsupplied learning and we don't care um we don't care whether the underlying data is from marketing or supply chain or text or video or images I don't care do any of these recommendations intentionally recommend an anomaly from time to time to bring newness absolutely good point cotland they absolutely do this um otherwise it tends to so here is the here is the counter point right if you don't do this uh you tend to the users tend to get bored right so you kind of need an element of novelty uh so they do bring in weird stuff just to try out if the user is now interested because user preferences also change over time so they keep throwing these uh interesting or weird things at the user but it is not it is not a binary right things can be um so remember interestingness is a spectrum and anomaly is also a spectrum uh so I'm I'm simplifying quite a bit right U but recommendation systems there are teams of hundreds of researchers and data engineers and scientists sitting in all these companies trying to optimize this recommend assistance um yes your PIN absolutely right that is exactly what happens that if you kind of get a lot of followers um then then or following then you tend to get weird things in your Instagram because the system is learning from uh who your social network is and recommending stuff that they like so that an example of user user similarity being used to recommend stuff okay so let's move on right so now um give you some example right the questions that I asked and we will specifically fck our cookies still use Samsung well off topic but uh cookies have been deprecated by Apple and Google is deprecating them as we speak uh Deepak is asking are these three different techniques used together sometimes they are I'm sure they are somewhere I don't know what we'll have to cook up an example to explain but in general DEA keep in mind that what I'm covering today are Lego blocks right and you can combine these Lego blocks to create Solutions so recommended system is a Lego block clustering is a Lego block we'll talk about more algorithms and what you need to use really depends on the problem that you're trying to solve in the real world okay so let's uh let's try to answer the questions that we asked a few slides back right what happens when your data set where each row is a text document and you run a clustering algorithm on it what happens right remember documents can be represented as a relational data format in X belongs to RN cross P where in the simplest way the columns are words but columns can be engrams right they can be dense Vector word embeddings um they can be TF IDF they can be many many things how they represented is outside the scope of this light we'll assume documents exist in the feature space of words another way of imagining this is that each column now is still representing a dimension and each document lies in a particular position in that space of words or uh TF IDF words or what have you so documents exist in the feature space of words right the intuition is that the documents containing the same words are similar so what can you use this for right here are some examples you can cluster news articles to organize them by topic therefore improving the user experience of a news app okay you can cluster product reviews to identify common issues that customers are complaining about or they are they're really happy about right if you have 100,000 reviews then you cannot do this by hand uh you can cluster tweets to identify trending topics or discussions you can cluster legal documents for case analysis and legal research and you can cluster customer support tickets to identify the common issues that are propping up for your customers right so examples of why would you want to Cluster text documents here are the applications okay what happens when you do images or video clustering right again images exist in some feature space and we talked about the most raw way of doing this is to represent images as pixels but there are many many derived features histograms that we talked about CNN derived features right gradients all of them so images are represented as some feature space right and that we have already covered in feature engineering uh what we are trying to see is we are seeing which images are similar to each other right why would you want to do this well clustering images can be very helpful for things like image search right uh because you want to pull up similar images and Google actually does that right if you kind of click on an image after a Google search result it also shows you similar image is down below in the right panel right so that is what image clustering is being used for right you can cluster um different segments in an image uh and that is what the image is showing in the right hand side saying that this area is actually uh land um is being used for agriculture this is Forest right this is a this is a disaster area there a a bomb fell here this this area is on fire right so you can instead of clustering images you can cluster parts of images to say I want to identify different kinds of thing inside the satellite images right uh you can cluster sign segment within an images to Aid object detection right especially again satellites it makes a lot of sense to do this right in a store video right retail store video where frames are uh basically images you can cluster them to analyze customer behavior and to track which product is popular for example image which have lots of people in them belong to the same location you know this area is popular and therefore this product is popular right and you can also use them uh in human activities to track humans as they move over a city area uh can be used for surveillance or it can be used in in uh recording of sports to kind of highlight for example in cricket when you hit a six uh then those set of frames every six will kind of look similar to each other right but not necessarily but they'll start to look similar so you you can do things like this with image or video clustering audio right what happens if you use a clustering algorithm on audio data well uh one of the things that you can use it and this is very very common is something called speaker diarization right speaker diation means let's suppose I have the audio recording of a movie right or I have the audio in a movie remember that uh this audio will have dialogues by multiple people so one person will speak and another person will speak let's assume that only two people are talking right so speech recognition is basically about what is the person saying and that has its implication for example in captioning close captioning or subtitles but another interesting thing is to know who is speaking right so if you cluster B if you choose the right features of the audio which help identify who's speaking then you can cluster these Audio Waves as as we are showing here cluster these Audio Waves into green and blue saying that this thing this audio signal was created by Speaker one and this audio signal was created by Speaker 2 and this audio speaker audio signal was again created by Speaker one so you can almost like do attribution or diarization saying identifying who is speaking and this can be done using clustering so audio segments based on speaker identity right U you can do it of course for music songs right if you have a library of a million songs you can cluster these songs together to say well these kind of sound similar right and they kind of form a cluster why would this be important well either you are trying to identify new emerging genr in music or you can use it for recommendation saying that if somebody likes a lot of songs in this cluster maybe we should recommend something from this cluster to this user right uh you can cluster podcasts based on content what is in those podcast to make recommendation system uh you can cluster speech samples right so here is a u here is here is a million recordings in English right but the way I speak English and the way you speak English might be very different because I likely have an Indian accent and you have an accent of where you belong to so a million audio signals are broken down into five clusters saying these are Indian speakers and these are French speakers who speak English and and these are these are Italian speakers who speak English these are so on so forth so you can kind of kind of use it to identify different accents or dialects in audio signals right you can use it of course to for health reasons right so over a period of time uh you can track the changes in the vocal characteristic of a person this for example is used for early detection of Parkinson's as an example and there are other examples of using this to kind of track um over a period of time a Time signal and if it is changing too much then say okay the new new signal is very different from what it was two months ago and the the clustering is different here's an outlier and we may want to look at what this person is going through okay so so that was uh that was a very quick introduction to unsis learning um which is a computational view of the world uh let me take a pause here and and see if you have any questions on what we have covered so far and otherwise we'll switch gears to a second part of machine learning called supervised learning okay so let's move on well the best way to understand super Li learning is um something called function approximation and we'll explain what that means so again our data model still looks like a relational data model irrespective of text image audio we don't care now we are now we in the world of relational data model what is interesting now is that view we view the data as X1 X2 X3 X4 XP which are the different columns p is the number of Dimensions remember and each one of these X's represents to a different attribute or a column but now we have added something called y right so we have added a new variable name called y right uh the Y is called by many names uh y is called a dependent variable it is also called outcome it is also called output right now what is interesting about the data is that whether remember the word the way it is written it is written in a very precise way it says you view the data as X1 X2 X3 X4 xpy which means when you are given a spreadsheet or a data frame or a matrix you may decide to call one particular column y or you may decide to call it X3 that is really your choice and this again requires a lot of understanding of the Enterprise the organization the domain you are in what is it they are trying to solve but if you choose to call a particular column why what you are basically saying is that my interest is being able to understand how does why change as other X is change right so the supervised learning problem is the task of inferring a function f right and that is what we are saying this supervised learning is not unlike unsupervised learning where the problem statement is very generic find patterns in data here we are seeing a much more crisply defined strictly defined problem statement that you have to find a function which relates the value of y to the value of other columns so you say I want to understand how does salary change with educ ation level age and location that is a supervised learning problem right or you say uh so or you can say how does how does the U what's a good example how does the customer retention rate change with the average amount of money they have spent uh the total number of products they have bought the frequency which they have shopped at my site and so on so forth right is y in this case a label yes Y is also called a label right it's also called a label dependent variable output outcome many things that why is called but the idea is to learn that function and that is why what we said in the breaking slide before this is that supervised learning is a problem of function approximation you're supposed to learn the function f which relates the value of y to the value of other variables so there are multiple ways of saying this find how the value of of the dependent variable depends on the value of other variables find how the outcome is related to the features and find how the output depends on the input all of these are basically the same problem statement that you are given the data you are given the relational data and you're trying to find y as a function of X1 X2 X3 X4 XP right uh another term which is very commonly used to refer to supervised learning is a term called modeling right and I want to spend a few minutes explaining where this word comes from and how it is actually the same thing that we just talked about the same thing that we talked about now from a different perspective okay so the idea of a model if you look at science the idea of a model is that a model is a simplified and an idealized understanding of a physical system right and the best example of this is a globe right if you see a globe um a globe is not the earth a globe is a model of the Earth and of course it does not capture all the characteristics of actual Earth but it does a reasonable job and adds value to our life similarly a Google map the Google Map is not the road right it is the representation of the road right so a model is a simplified and idealized understanding of a physical system right so that's what a model is if you go one deeper we have this notion of a mathematical model right a mathematical model is a representation of a system which is expressed using Concepts from mathematics right and the one simple example of this is what Newton discovered called inverse squares law right and Newton basically said that the gravitational pole between two objects falls off as the square of the distance between them right and that is like that is called an inverse Square law and if you look what we are doing we are defining a physical system we are defining a physical property in terms of mathematics and that's what is called a mathematical model right and these mathematical models depending upon the domain that you are in so in this case we are saying gravitational pole is inversely proportional to r s right so it's a mathematical function so you can have many many mathematical functions describing the relationship between different features and all of these are mathematical models in general a mathematical model looks like Y is a function of X1 X2 X3 X4 XP and you will start to see why this is term is used in supervised learning right one level more and we have this idea of a statistical model right and now we are getting into um the statistical modeling or machine learning the heart of it right the the idea of a statistical model is that a statistical model is pretty much a mathematical model but there is one important thing to worry about right if you look at a mathematical model a mathematical model is a deterministic relationship between the value of y and all other values of X1 X2 X3 X4 XP but in a statistical model very likely uh you can have different values of Y for the same combination of X1 X2 X3 X4 XP and therefore you need to be able to tolerate some noise and that noise is called Epsilon right now if you're new to this area a lot of this will sound like French so let me kind of um show you with an example right and hopefully this example will make things clear right so let's look at this picture uh I want you to stare at this picture for 10 seconds and then I will explain what is going on in this picture and why this is supervised learning okay so let's understand what this is uh this is a scatter plot right this data if you were to imagine what this data will look like in your spreadsheet this data will have two columns on the horizontal axis will be TV what it is representing is the amount of money spent on TV advertising that's on the horizontal axis in let's say hundreds of in thousands of dollars that's what's on the horizontal axis and on the vertical axis you have data about the sales that your organization did let's say millions of dollars right so if you are working with structured data in marketing uh uh this will be a very common thing that you will try to understand that how does my sales change with the amount of money I spend in advertising right so what this is is that each Red Dot that you see on the screen and I want to um focus your attention on the red dots let say Focus your attention on the red dots what you will see is that each Red Dot is a marketing campaign or an ad campaign that you run ran right so for example this particular is a ad campaign that you ran with a budget of $150,000 and it resulted in sales of $7 million or $8 million then again you ran another campaign at some point in time in the future and that again you spent $150,000 but this time the sales was $16 million why there could be a variety of factors maybe this campaign was run run two years later than this campaign because we are not capturing time here or maybe uh this campaign was run in during Christmas time and this campaign was run in in in February and that's why there is this Delta right so each Red Dot here represents a ad campaign and its position on this graph depends on how much was the budget for that ad campaign and how much sales did it achieve and that is the set of red dots which represents a data set in a from a marketing example right so let me pause here and see if there are any questions on what is this data set about okay so if you're comfortable with this data set um note that sales is your why sales is your dependent variable sales is your outcome sales is your output and your independent variable or feature or Dimension or column or or your independent variable or your input is the amount of spent on money spent on TV and this is an example of given X comma y about multiple past campaigns so given past campaigns of AD campaigns of marketing campaigns of TV spend and sales can you find a function which describes how sales changes with amount of money spent on a TV ad right so we zoom out and we say given X comma y the problem is one of finding a function f such that I can describe Y and here we have only one X which is the amount of TV spent at so can we Define why in terms of the okay can we Define sales in terms of the amount of money spent on a TV ad okay um and the reason this Epsilon is sitting here the noise term is sitting here is because we recognize that the TV spend ad is not the only thing which determines sales there are many many other factors so for the same amount of money spent there can be multiple sales numbers and that's why we say Y is equal to f ofx plus Epsilon right don't worry about this this you will do when you when you do uh basic machine learning you will worry about the second line but that's what fundamentally this supervised learning problem is now note that uh we have only one variable here called TV maybe there is a second variable which says the number of sales people that I have right because more sales people typically leads to more sales so it's not just advertising that matters it's the number of salese so if you have two varibles which influence sales what will this diagram look like this diagram then will look something like this let me zoom in so Y is again sales each Red Dot is a marketing or advising campaign campaign row X1 represents the amount of money spent on TV ad row X2 represents the number of sales people that the company had on the field at that point in time and you can start to write describe this plane now as a function which describes how y relates to X1 and X2 so the pro the problem statement is still the same given X comma y find a function f such that Y is a function of X1 comma X2 right now when the value of y is numeric which means y can take any number including decimals we say Y is a numeric variable and we typically plot y as one of the axes in our diagram and we say we have a regression problem so regression is a kind of a supervised learning problem and we say have a we have a regression problem and regression problem in this picture that you see has been solved by a machine learning algorithm called linear regression and there are many many other kinds of regression algorithms like support Vector regression decision free regressions which again we won't go into today today our objective is to understand that this is a regression problem right under supervised learning and why is this called why is this called a supervised learning problem how do we use it well we can use it to make a prediction note how the word prediction is now coming in you can predict the value of the output for a new data element which means if I tell you that I'm expecting to run a new marketing campaign with a budget of $150,000 can you tell me um and let's stick to the first graph I will ignore the number of salespeople so if somebody comes and says I'm expecting to run um expecting to run a marketing campaign with $150,000 as my budget what should I expect my sales to be well the way to do this is the linear regression model takes these red data points as an input and Returns the blue or the purple line as an output the reason that the blue and the purple line is important is because now we can make predictions so with $150,000 I go to the blue line and say expect about $1 million in sale and that is how you kind of extract out the information from past data which is red dots to create a model which is the blue line to make a prediction which is given X you find the corresponding value of y from the blue line and that is what supervised learning regression does let me take a pause and see if there any questions uh Professor could you explain uh like I understood the graph but I I'm not getting the point like uh exactly why it is called regression um you forget the word if you forget just forget the word the how the terminology is do you understand the concept yes that's enough okay thank you okay once you once you actually study the mathematics behind it you will understand where the word comes from but that's not important yeah got it got it profor so so Rahul you're absolutely right in the first case Y is a function of X1 only and the second case Y is a function of X1 and X2 absolutely right okay all right now let's look at another kind of problem statement and supervis learning this one is called classification the setup is very similar you are again given X comma y you are again asked to find a function f such that Y is equal to uh F of X1 X2 X3 X4 XP plus Epsilon but this time the important thing is that the value y the Y variable is not a number right it can take in this diagram let's stare at this picture for a minute right and I want you to ignore the blue curves and the lines and just focus on the black dots remember each dot or each bubble represents a data point in your data set right and if you look at the black dots uh the Y values are either this one one at the bottom or this one at the top which means y can take only two values right unlike the previous graph where y could be any number decimal uh here y can take only two values right and when you have um so Y is a function of X and it can take only two values here is another diagram here again Y is a function of two variables X1 and X2 but y can take only two values note that all the data points are either on the floor where Y is zero or they are on the roof where Y is one right y can take only two values IR respective of how many dimensions there are I can have X1 X2 X3 X4 XP but why is taking only two values when you have scenarios like this when you have data sets like this where y can take only two values we say Y is a binary variable and in general if y can take only three four five any one of a finite set of values we say Y is a categorical variable and this is a special case of supervised learning which we call classification note that in this diagram Y is still plotted as one of the axes right and as before F can predict the value of the output for a new dat data element absolutely V you you got onto it there's an example that is coming this is exactly what is the example that V is giving is an application of classification when the data set that you are working with is text right so since he has already let the cat out of the bag why can be taking two values spam or not spam and each row in your data set is an email which is described in terms of the words that it contains and you are expecting to say tell me a function which takes the words in an emails and predicts whether this email is Spam or not and that is exactly a great example of using classification to solve a NLP problem which is a subdomain of AI and we'll see many more examples in a few minutes I want to kind of show you another interpretation of classification which is more commonly used u in the field uh in this case again of course the problem statement is kind of the same but it is modified so again you are given X comma y but instead of trying to find a function which outputs the value of y as a function of X1 X2 X3 we kind of do something interesting right and to understand what we do I want you to stare at this picture right what we have done here is we have a data set with three columns X1 X2 and Y but note that Y is y is not an axis in this plot instead what we have said is since y can take only two values why should we waste an axis on it instead let us use colors to represent and Y right so we use since y can take only two values we color the bubbles orange and blue depending on whether the value Y is let's say spam or not spam right and X1 and X2 are used to capture the uh capture the axis so what you have done is you have saved an access but more than that by looking at this you can now reframe your classification problem as this you're saying given the orange Bubbles and the blue bubbles write down a function which describes a curve which separates the orange points from the blue points right or if you have more classes right you can have four classes so uh for example uh in this case instead of just Spam or not you can say I want four I want you to take and this is what Google does take an email and classify into spam inbox social bulk mail so there are four labels now here represented is red orange blue green um and X1 and X2 are different features and all these are each bubble here is an email and I want you to find an equation which describes this curve such that I'm separating the green points into one part of the space red points another part of the space orange another and blue another why is this an interesting way of thinking about the problem well if you give me a new data point that means you're giving me its X1 and X2 value and if you give me the X1 and X2 value I know where in this space will that point find and if it falls here then I will say it is likely to be red because it is surrounded by Red points if you give me a new row which Falls here I will say it's likely to be blue because it falls in the blue region right so this is a different interpretation of classification right where we are finding a function again but instead of finding a function which outputs the value of y by taking X1 X2 X3 XP as an input value we are rather using the function f to describe the boundary between the classes right um and and um and uh and use that to then make predictions right note that Y is plotted as colors and not as a different axis as before F can predict the value of the output the dependent variable the outcome or it can predict the label based on which region the element Falls in okay so this is a classification model uh and and and you're right rul the second example it's not a binary classification it's a four class classification uh so DNA it looks similar to clustering it is not I encourage you to go back and re review the lectures remember clustering is an UNS supervised learning problem there is no why we are not trying to predict anything we are trying to group data yes it looks similar because we use colors in both cases but the problem statements are very very different okay we will end the lecture by looking at why the hell are we talking about classification right so what happens when you apply classification techniques on text Data well well this is a slide that you saw yesterday email spam detection right where the label where the why is whether an email is Spam or not and X1 X2 X3 X4 features are the words in the email they may be metadata about when the email was sent who the sender is time and so on so forth sentiment analysis why is the sentiment positive or negative uh the X1 X2 X3 X4 X3 are words um why in phas news detection is fake or not fake X1 X2 X3 X4 can be the words it can be who's the reporter of the news it could be how many people forwarded it it could be uh the time it was sent and so on so forth customer support ticket routing it multiclass classification problem if you have five departments y takes on the value of the five departments and X1 X2 X3 X4 are the words in the document saying where should this be routed topic categorization of news uh you have 20 categories of news that you serve Sports entertainment politics whatever those become the classes and your objective is to predict which topic does the news belong to by looking at the content language identification there are 300 languages let's say in the world so you're trying to predict what is the language of a document by looking at different features resume screening um you are looking at the resume which has the text and you're trying to predict whe this is a good fit or not a good it for your organization legal case prediction you're trying to Output whether uh this this legal case will win or lose based on the content of the case right so what I'm trying to do is I'm trying to make you help you think in a way that computational people think in the machine learning and AI world so if you think about it what's happening is that there are these uh two broad categories that we have covered today un supervised learning and supervised learning and in a future class we'll also talk about reinforcement learning but in unsupervised learning and supervised learning there are these two broad themes and there are these bunch of computational models problems or computational problems that we have identified which have hundreds of applications in the real world and if you understand these basic ideas of regression classification clustering anomaly detection recommendation systems you suddenly have a new way of looking at the world so you look at the world and say well this is this kind of a problem and the minute you are able to map that then this whole field becomes available to you you know what to do you know how to solve that problem because many other people in the world have solved that problem using using the same techniques and that's really the power of abstraction that machine learning and AI brings right face recognition again is a classification problem right let's suppose um you are you are you are in charge of a security system of a company which has 20,000 employes so you have 20,000 labels the Y value is the name of the person and the number of of every time you get an image you're asked to predict who is this and that becomes a phase recognition problem right or if you are a simpler problem is if you are writing the software which unlocks the phone all you have to do is you have to predict whether this person is the owner or not the owner that becomes a binary classification problem so really depends on the application but it is a classification problem speaker recognition right another example that we saw the Y value is the speaker identity and the x value can be different features derived from the audio signal this again is a classification problem right so uh uh good time to pause I will stop here uh next time when we pick up we will talk about reinforcement learning and then uh the next class will be slightly uh the first part would be heavy because we talk about reinforcement learning the next part will be light because I will start introducing Innovation Frameworks which you will use to kind of do the assignments and which will hopefully help you to identify places in your organization where you can actually use this but I will stop here and leave the last 2 three minutes for any questions that you might [Music] have thank you I'm glad people like the session all right if there are no questions um then then then we will then we will stop this class and then I'll see you next week bye-bye