hello good morning yes yes good morning so very warm welcome to all of you again uh joining today's session here uh yes so uh very good morning good afternoon good evening from whenever whichever part of the world you are joining uh so let us uh wait for another maybe one or two minutes and then we'll be on our way time to those who are still joining e e okay I think uh we are probably ready to begin today's session so uh uh again a very warm welcome to all of you uh for giving me your time in joining the session so today we are going to look at the hands on implementation of linear regression and that is our lab agenda so I'll share the screen okay uh please let me know if my screen is visible to you yes Professor yes Professor thank you so uh yeah so uh we are going to be performing this particular lab using uh the Microsoft Azure Memel Studio Classic so I would like all of you to open Microsoft a ml Studio Classic and log into that and then once you have done that once you land on the experiments uh landing page and click on new click on this new and then the blank experiment typ here to start a new blank experiment please do that and once you're ready uh just type A yes in the chat window and then we can begin e okay so then are we ready to begin now just just one more minute Prof to yeah I'm lost do I go to um to go to Microsoft machine Learning Studio yeah so if you if you're able to still see my screen uh here just go to Google in Google just type uh Microsoft or just type studio. azure ml.net okay just type studio. A ml.net that will take you to the Microsoft ml Azure ml Studio website and then uh you can sign in here with your using your guu ID or whatever ID you're using so far you see something I've already signed in so this is showing my experiments but in your case it will show a button saying sign in please click on that and then sign in using whatever Microsoft account account you have whether whether you have your ggu ID or any other email that is tied to a Microsoft account thank you so yeah please do that e e hey once you're ready please send me a yes on the chat window so I'll know thank okay I have a yes from Richard uh that's good I have yes from Raj Shar have a couple of one or two more yeses but that's it so I'm still expecting you know more people more Learners to more of you to join uh to start uh with the aure session you guys see some more yeses now so let me know when we are good to go and uh we can start uh so I we need to look at so what do you mean by sample the data set so you talking about the data set in the sample um window we could see addult sensors which one we should do yeah yeah it's a third one okay so we haven't started yet so yeah but since you're asking I'll tell you just the third one here oh thank you the automobile price data yeah so yeah so you need only to open Azure ml Studio Classic okay and then uh click on the new symbol uh that will appear on the bottom left hand corner of your screen to start a new blank experiment click on this first blank experiment type here to start a new blank experiment okay so can we start now yes professor okay yeah so I see a lot of yeses also so we can I think we can start so here uh the the data set as always you know since you're solving a linear regression problem as always we need a input data set and the data set we are going to select is going to come from under samples here on the left hand Pane and it is going to be the third one that says automobile price data okay so please drag that and drag this and put it into the workspace area here I already put that here I already have a running version of the experiment yeah um yeah so that's that's what that's what you need to do here now uh so let's explore this data set before we uh begin our journey onward journey to build a linear regression model out of it so if you see here uh it has 205 rows not many and about 26 columns okay out of which the target variable since we are solving a linear regression problem is it supervised or unsupervised supervis supervised right so we are solving a supervised learning yeah we we're solving a supervised learning Problem by means of building a linear regression model for for which when because it supervised learning so there has to be a Target variable there has to be a label column sort of a Target variable not a label column there has to be a Target variable uh for which the target variable here in this case uh is price so we are going to build a linear regression model which can effectively try to predict the prices of a given automobile so every single row here represents one single automobile okay and with its multi mude of features with a multitude of say 25 features that describes the automobile in some way or the other and uh it it has a corresponding price listing now typically this kind of data is also maintained by insurance card companies uh you know uh by uh insurance companies that provide insurance automobile insurance to cars and uh it is of interest for them to uh you know deal with this kind of data to maintain this kind of data so that uh can minimize their losses they can minimize the uh payments they make towards uh you know whenever any mishaps or any uh insurance issues arise So to that end uh uh this becomes also important also an important problem to solve now let's look at the uh attributes one by one let's look at the columns so the attributes one by one so initially in this data set you see uh there is something called a symboling this symboling is a is basically a categorical variable and it has categories between say minus 2 to plus three here in this case the range is between min-2 to+ three where negative values indicate High Insurance risk of insuring the vehicle where zero in uh zero will typically uh you know uh mean that the insurance there is no risk whatsoever uh the risk is minimized and higher even higher positive values also indicate fairly low amount of risk in insuring the vehicle so uh this is something that uh symboling is something that insurance C companies maintain in order to classify different Vehicles based on how expensive it is to uh insure these vehicles and uh in that context also they maintain something called as normalized losses where these are losses incurred by the insurance companies uh in making their uh in making out payments to these uh for these vehicles in the case of a car crash or in the case of an automobile uh any issue that is being covered by these insurance companies okay whenever any issue arises in the automobile and that is been covered by the insurance company there is a corresponding uh uh payment made by the insurance company to the vehicle to the vehicle owner and so that incurs a considerable amount of loss so these are uh so they also maintain this kind of dat data where it tells them for what kind of vehicles losses are maximum what kind of vehicles these losses are are not that great not that are not significant so uh that is what is normalized losses and then uh then the rest is you have something called as a make of the car so whereas symboling is a categorical variable normalized loss is numeric uh here now the make of the car again make of the car is has different makes you see see Al together 1 2 3 4 almost 10 9 to 10 makes of the car make of the car is basically the car company either it is a Toyota or a Nissan or an Alpha Romeo Etc uh now uh then the fuel type of the car is is either one of these two it's either gas or diesel gas typically means you know in the west gas typically is referred to as petrol so if it's a petrol powered vehicle or a diesel powered the aspiration is uh the air intake relates to the air intake by the engine so that is either two types it's either standard or Turbo Turbo relates to larger amount of air intake and correspondingly producing higher power of the engine uh then you have number of doors it's either a two door or a four door Okay C okay either a two or a four car and uh null here probably refers to you know missing data okay and uh also the you look at so these are all categorical variables all of these then body style Body we have couple of questions yeah yeah so the symboling and normalized losses I'm not very clear um could you explain in one liner something symboling basically this kind of data is maintained by you know insurance companies where uh symboling refers to the relative risk of insuring the vehicle it's kind of a risk assessment they do in terms of when they insure a particular vehicle what is the relative risk of insuring that particular automobile okay so if a risk of for the insurance company or it is a risk for it's a risk it's a risk it's a relative risk uh as far as the from the perspective of the insurance company okay so higher risk automatically will mean that they'll have to make uh in larger payments uh in the case of any incidents uh for the automobile if there any incidents related to that automobile something happens and they'll have to they'll incur larger losses so that means there is a higher risk of insuring those Vehicles okay that's fine could you explain the normalized losses one more time normalized losses are losses incurred by the insurance company making these payments towards uh Insurance incidents okay whenever any incidents occur related to these automobiles that are insured by the insurance company okay they are basically they'll have to make payment in order to uh keep their uh contracts so the the net amount of losses incurred by the insurance company insur insuring a particular vehicle that data is also maintained by the insurance company so if it is a for example an Alpha Romeo the insurances or a porch you know these very expensive Vehicles expens very expensive vehicles uh will will incur larger losses in making uh payments towards uh you know if these vehicles or these automobiles are CAU in an accident okay if some if there is any damage to their body if there's any damage to the electronics Etc then uh that will incur larger payments much larger payments compared to other vs so uh those are those occur in the form of losses so that that data is also maintained by insurance companies so Professor um the term normalized here has no relation to normalization right um where the data is um adjusted so during no no no that uh typically what we understand as normalization this is not that but uh when that when they might have created this data during the creation of this data they might have done some kind of normalization to these to these values okay and that's why normalized is is used here to kind of to say that these losses are just not the real figures they are normalized in some way okay thank you during the data creation okay uh okay so so that's good is it good now yeah so uh moving onward so you have these uh you have all of these categorical features where even the drive Wheels there are three categories of Drive Wheels forward wheel uh forward wheel drive front wheel drive rwd represents real wheel drive and 4 WD means allhe Drive four-wheel drive engine location wheel base uh engine you know in some very high-end cars like uh you know for ETC there's something called this Italian car called Lamborghini Etc in very highend cars uh you have the engine mounted at the rear okay so there are very you see very few automob have rear mounted engines the great majority of them have front mounted engines so again engine location is something that's that a contributory feature contributing to the price of the car if the engine location is at the rear the chances are that is a very high priced car versus uh a front mounted engine resulting in a lower priced car a lower priced automobile then you have these numeric descriptors which are the wheel base okay so these are all numeric descriptors numeric attributes uh that refer to the wheel base wheel base is the distance between the front spoke of the wheel and the rear spoke of the wheel that entire length is called as the wheel base of the automobile and then you have the length the bumper to bumper length the width of the car the height the dimensions of the car these are all numeric curve weight is also numeric okay every individual automobile has a certain weight weight and also depending on the accessories of the car right every car I mean every make of the car should have similar weight that's true but again depending on the accessories of the car that is being insured extra accessories will add additional weight so mix and match of accessories will produce different different weights for the car again engine type so for engine type you have these different types of engines uh uh rotor engine okay o Etc so there are about six or seven Cate categories of engine type then the number of cylinders of the engine okay so larger cars like like your BMWs you know or you know highend cars they tend to have larger cylinders like your V8 engine if it's a uh V8 or V12 engine right that means V12 means it have 12 cylinders okay there very very few cars again you see the number of cars having those 12 cylinder engines are are far less than than the number of cars uh having uh just four or six cylinder engines so larger uh engine uh larger number of cylinders uh so a larger number of cylinders will necessarily contribute to a larger power output larger power throughput of the car um then engine size uh some size of the engine okay um then we have fuel system so all of these again this one engine size again is numeric attribute again this the fuel system mpfi here relates to I think multi-point fuel injection okay and then there are different types of fuel injection systems so you have a whole one to five or seven five to seven different types of fuel injection systems you have the board diameter okay the engine you have the stroke length of the engine then we have something called as compression r of the engine uh the horsepower the hor the power throughput of the engine in terms of horsepower uh you have Peak RPM Peak revolutions per minute so this relates to the the way uh the number of uh times the engine uh revolves the fly of the engine uh the number of times it revolves the number of revolutions per minute uh then the C miles per gallon so then this this is again a new it gives you the city miles per gallon of the car that is being ensured some cars we don't know some cars may be old some cars may be very new okay if it's a brand new car you're uh you're probably likely to get better mileage than cars which are older so this will vary okay um and this is a numeric feature here so again miles per gallon in the city will be far less than your highway miles per gallon if you look at this data and that is because in the highway you're driving at a constant uh speed constant acceleration and that tends to burn the fuel uh more efficiently than when you are constantly breaking and uh making uh frequent stoppages within City Limits so it tends to give you less mileage in the city as compared to the highway and uh then last but not the least we have our Target variable which is price so every automobile has a a price depending on how old or how new the car is the car you know has it done a few years then there is some depreciation of the cost of car or the price of the car and so all those factors included will determine the price along with these uh features along with the characteristic features of these cars uh yeah so that's about our data that's a data set we are using and Bes because we are going to perform linear regression on this uh we will be using the price variable as our Target variable so uh the very onset uh please select the please use this to select all the columns in the data set okay just drag this from your data transformation here and then um go to manipulation then uh so I think you know this you've done this quite a few times uh before and then drag this select columns and data set here and then for this uh launch the column selector and select all the features and all the features will be showing on the left hand side under available columns just select all of them drag them to the right and say confirm your selection so we're doing this uh this actually select columns and data set component allows us to selectively uh you know kind of make a selection for every feature and drop out certain features if they are not important but here uh the intuitively all the features are are important in some sense or the other then in some sense or the other they they are affecting the target variable they have some uh impact on the target variable so uh there is nothing that looks like uh at the first inspection there is nothing that looks like completely unrelated or irrelevant that we can drop uh so please do that and then run this component so now the very first data transformation we are going to perform is is we are going to remove data duplications the way we going to do it is uh we going to Define by saying what is going to be what is duplication so here we'll Define duplication by saying that any two rows of data any two or more rows of data are duplicates of each other if their values across all the columns if the values across all the columns not just one column or two columns or three columns the values across all the columns including the target variable if all the values across all the column colums of those rows are identical then they are duplicates of each other okay duplicates duplications uh introduce unnecessary redundancies and we would like to remove such redundancies from the data set so Azure gives us you very easy way to do that and this is what it does it said remove duplicate rows so introduce this just uh make sure you uh join these connectors and then launch the column selector to configure the remove duplicate rows launch the column selector and select all the columns including the including the label and confirm your selection and okay so now uh so let's see if our data did have any duplications in it uh initially we had 205 rows and after yeah it still has 205 so there are no there are no duplicates here in the data anyway uh but we're still following a more a very generic process here by means of which uh you know all these issues will get tackled so the our next Endeavor is to now do what is to because we have a mix and match if you look at the data set we have a mix and match of both numeric as well as categorical features right so we want to to separate out the categorical features from the numeric counter parts so to do that we going to introduce the edit metadata component so please drag the edit metadata component from under manipulation here put it here join them okay uh please join them and then click on this edit Vera data and say and then launch the column selector and now you want to uh create these as your uh categorical features or categorical descriptors so move these uh categorical descriptors to the right hand side okay I think there about 11 of them uh could we just hold the screen for a few moment we'll go ahead and copy that yeah yes yes yes yeah professor and one more thing uh I'm stuck with this remove duplicate rows uh it gives me a red color exclam exclamatory Mark so thing is have you join you should join these two connectors yeah I joined but I'm not getting that uh okay and okay and then you need to conf you need to launch the column selector okay yeah Professor that select columns in data set uh that is also so not a green tick for me so I have done this yeah I have done this automobile price raw data okay have you connected it but make sure connect connected yeah okay then after select columns in data set launch the column selector yeah I did that and all the yeah okay all the columns in the yeah and then remove everything else remove everything else first just right click on this and and say R selected okay I did but it doesn't give me the uh green tick are you still you're still not getting the green tick no so just see what the error is what is the error uh for the select columns in data set it doesn't give any error it doesn't say anything uh do you have provision to share your screen I'll try you cannot start screen share while other participant is sharing so maybe you have stop okay I'll stop sharing okay yeah Professor share so remove this edit M data from okay okay uh now click on select columns in data set go to okay launch the column selector okay uh go down make the selection confirm your selection well now this is oh was it because of that edit metadata yeah yeah okay Professor thank you then and then again uh yeah connect it click on the remove rate row yeah launch the column selector and again uh all the select all of them and move them to the right hand side okay then done thank you Prof I don't know why yeah thank you thank you I'll stop share it's still not R see yeah now it's done okay yeah thank you so yeah now we are B so could you please go to edit metadata and show me the list of the so we have I'm I'm launching the column selector fored met data here yeah this is uh selected column it is showing 11 for me it was 10 so one is missing for yeah some just hold uh you will have to check on symbolizing because if you go by the drop down and look at the string it picks only 10 okay simping is category symbolizing is on the top that's why we miss yeah symboling is uh should be included as a categorical feature the ones on the left are numeric the ones on the right are categoric thank Professor yeah just one quick question I mean if you go by the drop down on all types and if you select string then symbolizing takes it as it takes a numerical yeah yeah so because Azure can only determine by uh as far as Azure is concerned seeing that's a thing it can only determine by means of the data type it is looking at although so that's why although symbol appear to be numeric see to numeric feature so that's why it is feature there okay but okay uh from our domain knowledge we know that this is not really numeric it is basically categorical you see so this has to be included as category it is still not clear can you just you know give with some example because I think it it is a way to to determine okay whether the insurance whether the maintenance is high or not right so okay so so that is calling it's symboling means uh basically you just a one lineer and symboling would be it's a it's a risk assessment for insuring a specific automobile okay it's a risk assessment uh it's the risk incurred risk incurred when trying to Ure a specific automobile what is a risk INB in ensuring a specific automobile it's simply okay simply put so high higher the value the risk is more right no so lower the value it implies a greater risk the more negative the value the more positive it is the lesser the risk okay okay yeah so make sure you uh please make sure that you include all of these there are 11 features here okay 11 categorical features here uh professor sorry to interrupt I don't have the database you're referring to where can I get this file please can somebody help so this is this is already available under samples under save data sets under samples you will see the automobile price data do you see that okay thank you okay so uh then uh okay are we done with this now uh by launching the column selector on edit metadata and then uh demarketing which is numeric and which is categorical so make sure that you have 11 columns selected on this as category okay so is this done now yes Professor okay move ahead okay so confirm your please confirm your selection and uh run edit metor data now when we are actually you know uh all that Azure exp has to do is to give it the list of categorical which which descript is a categorical and which are numer right uh internally however when it uh when it submits all of these features to the model it has to do it numerically because ultimately it has to the model will run uh basically on a on a computer right so computer doesn't understand uh uh strings you cannot do mathematical operations on strings so ultimately even categorical descriptors will ultimately also be converted to their numeric counterparts basically by means of what is called as one not encoding or dumy encoding that means these categorical all these categorical descriptors will be converted to their equivalent binary encodings okay they will be converted to their equivalent binary encodings by means of either employing one heart encoding technique or uh something called as dummy encoding technique internally by Azure all those kind of conversions all those conversion abstracted away from us okay when we are using a what Azure only expects us to do is to give it the name of the categorical variables which we have done rest all that background work is going to be done by Azure itself okay so however we still not done before you run it uh make sure that under this category drop down that you select make categorical so you have to say explicitly say make categorical these these are my categorical features okay and after that please run edit metadata this is also important here you're specifying explicitly by saying that okay these features are explicitly categorical Y and we're done am I a yeah okay so can we move forward now right so we have been able to now successfully uh demarcate our categorical from numeric from the numeric features okay now our next step is to basically perform data imputations on both numeric as well as categorical so first we'll perform the data imputations on our categorical data to that end we need to include the clean missing data component which you will be able to find in under manipulation so under manipulation you will see clean missing data we drag that here and place it here and then join these two nodes PR okay then select clean missing data and then launch the column selector okay so here just make sure the same exact 11 uh categorical Fe are selected for me it is showing all 26 columns is in a selected columns list so we have to filter it out yeah so then what you do is yeah just filter it out filter the rest the numeric ones and send them back to the AV col so making sure that only 11 of these columns are selected so then we are only doing the clean Miss data only for the categoric right yes yes this is only data imputation for categorical variables we doing it first right after this we will be performing data imputation on the numeric variables okay yeah Professor I think uh this hrml I think is a bit slow for me because now edit metad data worked and when I click on clean missing data and click on launch column selector it doesn't give me the list of uh this uh available columns it just shows no no so no no so you have to click on by name here click on by name make sure okay you launch the column selected click binding okay yeah yeah got it thank you Professor yeah now it is coming but all of them are coming okay fine so we have to select remove you have to you'll have to filter them out yes okay okay thank you so uh so is this done now uh I'm done okay yeah so do that and then please run no sorry we're not done yet because uh we'll have to specify what kind of imputation we want to do so the for the categorical variables as always what do we do what kind of imputation do we perform on categorical variables replace with we say replace with mode right sometimes uh depending on the type of categorical variable also we can perform uh data computation using median but generally you can also perform replacement with mode if we are if we are going out with the replacement with more will it not create the bias in the data set because we are taking the so whatever the whatever the most frequent data that we have if we are replacing with the mode as well so it will get bias right bias bias in the data set is related to variance right bias and variance of models that relates to overfitting or underfitting so uh so what I mean to say is the data set itself we are actually manipulating telling that if the data is not available whatever the most frequencies are there you go ahead assume that is that's that is the only thing you can do with a categorical variable you cannot do any you cannot replace it with any other sumary statistic that is you cannot complete the you can you compute the mean on categorical variables no there is no other way to impute it except however you can use a custom substitution value but then you must know what value it is that you want to substitute it must be reasonably uh it must be a reasonable value uh there must be some basis for how you computed that custom substitution value right so that's why you can go with repl a general practice as a convention for including for imputing categorical vares so if it is a nominal categorical variable can replace with mode if it is an ordinal categorical variable can sometimes use median as well because of the ordinality okay so please uh then run this so once this data imputation step completes right after that now we are going to perform another second data imputation on this time on numeric variables on our numeric descriptors okay so again introduce another clean missing data drag another clean missing data component from the left hand side here okay and click on this join these two make sure to join it from the left hand side because the left this node on the left uh this output node on the left for clean using data is a clean data set after applying the imputation okay so uh actually before even we do that let's examine these those columns to see if we have any missing values so if you open the data set right here after the imputation of the categorical variables you see you see you still have missing values for the numeric ones because we haven't done any carried out any data imputation for the numeric variables but when it comes to categorical you see the missing values will all be zero now okay F Type missing value zero aspiration uh dve Wes just do a quick check all missing value right except for the numeric ones numeric ones still have missing may have missing values for example bore you see this numeric descriptor bore has some missing values because that is because it's a numeric descriptor so all right let's go ahead and uh carry out the data imputation this time for missing uh values on numeric variables so uh join these two then launch the column selector and this time you're going to select all the 15 uh numeric uh variables including the price column okay including the uh price including the target variable which is the price column so uh select all of these 15 the remaining 15 which are numeric and uh filter out the ones which are what are these on the left these are nothing but the uh categorical features on the left so fter these out make sure all are numeric here you'll have 15 columns select them and then uh confirm your selection okay are we are we done on this yes let's move forward profess okay so please do this then run uh run the component uh the cleaning mode we need not to set oh sorry I forgot to mention that yes you need to set the cleaning mode either you can set the cleaning mode to uh replace with media and I have done a replace with the median here okay uh you can also do replace using mice mice is a very sophisticated algorithm for performing data imputation uh maybe take may take slightly longer okay uh can also do uh replace using mice I have done replace with median for numeric descript because median uh I could have also done replace with mean but why did I choose median here there is a reason why I chose median and I didn't go with replace with mean so these are in replacing with the summary statistics like mean median and mode are computationally very efficient because they don't they're not not computationally complex uh but something like replace using could be computationally complex when running it on large amount of data for this it doesn't matter because we only have we don't have much data on this but when you're talking about very highly scalable large data sets large scale machine learning data sets and then something uh when you're performing imputation by using mice okay um then that may take considerable amount of time Al time complexity is high so I've gone with replacing with median any idea or any uh reflection on why I have selected replace with median rather than replace with me is because because mean is susceptible to outliers in the data right median and mode is more robust to outliers that's the only one reason because it's numeric we can go with Medan yeah so please do that and run it so now we have been successfully we have complet successfully completed the data imputations on both our categorical as well as our numeric descriptors now next the next data transformation now we are going to perform some feature engineering we're going to perform some feature engineering step in terms of normalizing the data normalizing our data here right so professor the question is can I yeah yeah Professor sorry I'm getting error at clean missing data it shows that cannot process column make of types system string uh for the second clean missing data yeah yeah for the second clean missing so for the you probably have your columns one of the columns may be incorrect just check that you have 15 columns selected here you should have 11 columns selected on yeah I have 11 columns let you have 15 columns on the right hand side I have 11 columns on left hand side and no columns on the right hand side no columns no now you have to S these 11 columns 15 columns on the right hand side I can see only 11 columns on the left hand side okay uh so you will have to take the drop you have to take the drop down there you know under all types go to numeric and ensure that the numeric are in on the right hand side by which means this okay you can but uh is it selected as all types here just check or is it selected selected all types selected selected as categorical okay but when you select all types it gives you these only yeah okay but prior to this uh before this uh have were all the features showing when you selected them just a minute I'll just check okay here you can do something just in the previous clean missing data also I can say only 11 H then something is wrong in the in the way you have selected the features maybe in edit metata just check what is showing in your yeah same same 11 only so it's I think I selected yeah yeah so you made the selection from there on is wrong edit met data should look like this 15 columns on the left hand side which are numeric and only 11 columns on the right hand side category I think I selected from from the beginning from the select column data set selected so in your select columns and data set you should have selected all the features how many features did you select yeah yes yes got I got it I I'll get it correct thank you thank you so [Music] much sure yeah so uh now we going to introduce a feature engineering step and one of that is called as data normalization so now can I uh skip this normal ization step and directly you know move onward and uh split my data a or do you think it is a necessary step it is necessary profor because the kind of data numeric data that we have it might have a different type of ranges might vary from 0 to one or it might vary from 0 to in that case so you have yeah so yeah yeah that's that's that's true so uh we we will perform will we will have to perform this feature engineering step of data normalization because if you look at the data here uh for these numer descriptors okay all have different units so normalized losses has some units so this also see the missing Valu is zero now has been successfully imputed now normalized losses have some units that you don't know uh you know then some some of these other features like wheel base maybe in centimeters okay so the when these are in centimeters again the unit for curve weight again changes this is in probably in kges or pounds so again uh if you look at engine size again this this becomes something in inches or centimeters so you have different the idea being that uh you know compression ratio again this has no unit because it's a ratio so uh you have varying uh ranges not only do you have varying units but you also have varying ranges okay by range I mean the minan and the the by range of a feature I mean the minan and the max so every feature that you see has a different mean and a Max you can see that so not only does every feature have varying ranges they also have varying units they also have varying scales well something is uh you know the the uh mean is 3. two here the median is 3.29 right but some other feature uh the median is 65 so that is a Shar change in scale that is a very Ste change in the scale the scale of the values not only in the range but in the scale of the values so what normalization allows us to do is it allows us to homogenize these differing scales ranges and units across all these features and homogenize them and bring everything down to a common scale and constrain everything within a certain bound okay within a certain interval bound so that everything gets homogenized and we are and all the features are on a Level Playing Field now so that features with larger ranges okay feature with features with larger scales do not tend to dominate features with smaller scales okay because uh otherwise during training it may give the impression to the model that features having larger scales are overly important are more important or they tend to gain more importance more and then they tend to attach themselves to larger coefficients in the when building the regression model sorry they tend to build they tend to uh get attach themselves with larger regression coefficients or the weights when building the model because they become tend to become overly important to the as far as the model is concerned this the model starts treating those features as being overly important just because of their scales so that is uh that is something that should not happen that is not desired normalization allows us to remove those disparities okay and so we'll need to normalize and uh so various I already discussed these different types of normalizations so we'll use M Max here okay and uh so now for again for click on normalize data here so this normalized data component you will find it um under scale and reduce if you go to uh if go to data transformation okay under scale and reduce you will see normalized data drag it here okay and click on normalized data use the transformation method is V Max uh then check this box that says use zero for constant columns when checked okay you see this uh box here the check box please make sure this checkbox is stick and uh then launch the column selector and uh when you do that so uh now we can only normalize numeric descriptor so again just uh filter out all the numeric descriptors make sure you don't you remove price okay uh price is not a descriptor it's not a feature it's a Target variable okay uh so make sure you remove price from that and uh just select all of these numeric descriptors and confirm your selection so there should be 14 columns selected which are all the numeric features okay and then uh and then run it Professor could you please explain the mean Max transformation method words so minmax transformation minmax normalization is a generic type of normalization which does do not make any inherent assumptions about the underlying data distribution okay it's very generic when you don't know anything about the underlying assumptions of your data distribution this is something you can use okay if you know something about the underlying distribution and if you know the underlying distribution is normal then you can use a zcore uh you can use zcore normalization for that uh this is more something related that related to standardization me Max is a normalization method which uh constrains your the dynamic range what it does is it constrains the dynamic range of your data to between a hard bounded interval of 0o and one so all the values okay all values across all the samples across all the features we'll get hard bounded to between zero and one that means they'll get hard bounded to a common scale okay so it does that by uh employing a there is a mathematical formula uh for the way that minmax normalization occurs and because of that formula we have this use Z for constant columns check this is something you don't know please let me know I'll let you know I think you do so yeah thank you so normalized data is taking a little bit of time I don't know why but U professor in the meantime I have a question this is Raj I see that after normalize data you've gone to split data but we've not done the conversion of the categorical variables to numeric variables one one heart no no no no we are not going to do that that we are not meant to do that that is going to happen here when you specify the all you need to do is you need to specify the categorical variables demarcate the categorical variables in edit meta data and all of those uh conversion of the categoric to their numeric encodings to the binary encodings will be done by Azure that that is all abstracted away from you that's why you said it is all abstracted those computations are all abstracted away from us okay Azure will take care of all that but But ultimately that is what is going to happen ultimately everything will be passed as as as numeric you know to as numbers everything will pass to as numbers to the uh to the computer okay so edit metadata essentially is is a part of one hot encoding when you specify the categorical variables it does the conversion yeah underlying yeah that's why it's asking you why why would it ask you otherwise it is inquiring and that is the reason is inquiring with you as to you know to specify it if something is categorical okay as opposed to numeric that is the reason it is asking you thank you Prof yeah welcome so now uh okay the next step we if you already normalized your data and if everything is gone well so far sorry uh now we can now uh we can introduce uh split data and so you'll see split data from here under uh sample and split so under data transformation you go to you can go to sample and split and see split data here just drag it here place it here so this is the component so by what we are going to the way we are going to configure the split data here is by saying that uh the fraction of rows okay in the first output data set sorry the fraction of rows in the first output data set we'll we'll uh have it as 095 so what what what does this mean fraction of rows in the first output data set it means that the first the the first output data set is referring to the node number one of split data okay the first output data set is referring to the note number one of split data and the fraction of rows is asking us to specify the fraction of rows that is going to come out of this node number one that is going to emate from node number one so uh we going to put it as 0.95 because you want node number one to to present our training data set okay and consequently what will happen is 1 minus .95 or 05 which is which T amounts to 5% of the data set will be available made available at node number two okay now yeah so also now make sure randomiz split is checked here and put in some random seed okay and uh also we are going to perform stratified splitting so for stratified split set it to true and uh you can launch the column selector and uh I'll choose to select number of doors so select number of doors as your uh yes so the number of doors we can select that as a stratification column okay so stratification column is uh meant to be chosen so such that it can uh group your data into multiple groups okay it can effectively group your data into two or more groups okay that will help and so selecting such a selecting such a column for uh performing stratification so what is stratification so is it a data yes it's kind of uh it's kind of B but uh not really so stratification means for example say uh uh apply to apply to classification for example if you have it's it's kind of maintaining the same proportions of instances for both uh uh and replicating from the original data set replicating that into both your training and into both your training and test okay so that uh that enables better convergence of the model helps the model to converge faster as well and also perform better but how this column we need to set what is that interpration so if I understood correctly ification means we should take the proportion of the data from the original data set to the uh sample data yeah something so stratification good choice is something that can effectively categorize your data into two or more categories two or more groups rather okay that stratifies your data so select such a column that can effectively stratify your data two or more groups that would be a reasonable choice for caring out certification so uh in classification sorry sorry sorry yeah please go sorry Professor yeah I just know to find out why do you choose number of doors as one of your columns for the certification is a reason why you choose that attribute number of do is uh effectively categorizing it into two parts no it's into two groups for one it is either two door or a four-door car two four or I think there's one more so just into two or three groups that's why you could use other could use other stratification columns also okay there's no hard and fast you don't have to you can use other types okay okay but uh not not something that does not do it uh for too many groups groups something that effectively you know uh can stratify it into maybe between two to four groups is okay so something like uh you could use uh anything you could use uh for example U engine type can we also use engine type yeah you can use engine type right this engine type okay so the r is that they to help us to break them into into groups stratify them into different groups okay sure so the r time is about two to four groups that's what I saying right two to four and the on the averages at least okay okay thank you is reason thank you professor and uh yeah please yeah Professor like uh in the clean uh Missing data and uh there are two outputs one in the left hand side one in the right hand side so right hand side is uh meaning is that it is not cleaning uh Missing data in the right hand side dot so left hand side is clean data output this is the cleaning transformation so basically this is the uh mapping that was used to perform the data imputation so of which we have no use the left hand side is what is the data set after the imputation the the imputed data set is only made available on the left hand not the right hand right and note is just the mapping that was used or the or the mathematical transformation that was used to impute the data okay so it doesn't have the actual data same is it same for normalized data it is uh yeah so here it is again telling you the transformation function so it's a mathematical function of which we have no use same for normalized data yeah got it thank you right so where are we now uh split data okay yeah so please please uh make these selections put in the random seat okay I think you know why random seat is important right random seed is generally useful for model reproducibility if you want to keep reproducing the same model across multiple reruns of the experiment or even multiple runs of the model itself you won't keep repeating uhan keep running the model repeated number of times you don't want the data to change because if the data changes the model also changes so you want to keep a model which is reproducible which is the same every time you rerun it which is is identical every time you rerun so from the point of view of model reproducibility uh we we set a random seat setting it to zero will unset the random seat okay so instead of keeping it to zero we'll use any random seed any number here and set that as our random seed that number has some you know higher upward values depending on if that random SE data type is an unsigned integer so it has some range some memory range uh which you cannot exceed okay all right so yeah please run now now that we have done please configure this and run it now we going to introduce actually we have only bated into two data sets but uh because we going to perform we also going to perform some model hyper parameter tuning so therefore we want to actually split our data set into three parts not just two parts we want to split it into training the test and we also want to split it into a third part called as the validation data set the validation data set is what we will be using when we actually perform the hyper parameter tuning uh and that validation data set will actually be used as a test data set during the hyper parameter tuning so to again bifurcate it into three parts we'll introduce another split data component here a second split data component and we will join this split data with this again here for the fraction of rows we'll say 0.95 so now in effect this will become our training data set whatever will emanate out of node number one will become our training data set whatever will come out of node number two will become our validation data set and whatever will come out of this node in the previous split data component will become our test data so training data validation data and test data okay so the way we going to configure this is for the second split data make sure it is uh again 95% and uh again put in some random seed and say stratified Pro and you can use the same descriptor here okay and then uh run it run the second split data again it here okay yeah so please uh run the second split data and so let's look at how many samples we have for our training set for the training set we have 186 rows the validation data set we have nine rows and for the test data set we have again a handful of rows 10 R test okay this good enough so now now uh this sets the stage where we can introduce the machine learning algorithm okay so you introduce the M algorithm here where you see uh machine learning on the left hand side here and then so under initialized model we need to go to machine learning under initialized model go to regression and then under regression you will see linear regression just drag linear regression place it somewhere here okay and then we need to configure this now uh what I want you to do now do is also parall also start another linear regression here and I'll tell you the reason but for now just uh introduce another drag another linear regression place it on this side here okay and uh then we need to configure this okay uh now so let's look at the linear regression component here okay so if you click on linear regression here we need to configure the linear regression component here now there are two ways to do this if you see here uh there's something called a solution method or the solution method we appli to perform the linear regression there are two types one is called as the ordinary lease squares one is called as the ordinary lease squares the other is called as the online gradient isit okay now ordinary Le squares when you talk about ordinary Lee squares this essentially what it does is it minimizes the sum of squared errors okay it minimizes the sum of squared errors and that is the way it does its optimization by minimizing the sum of squared errors and then trying to find the best fitting line by minimizing the sum of squared errors and this incidentally it happens so happens that ordinary Le squares is a uh it's a closed form analytical solution that means there is a mathematical solution where you'll have to take a you have take a mathematical inverse of this and then to come to the solution to come to a final solution so there's a uh analytical close from solution but the ordin Le CL okay but however for the that is not so for the online gradient desent okay the online gradient descent is more of an approximate solution it is more of an approximate solution that is iterative it is an iterative optimization technique that gives us sort of an approximation of the actual value it is more of an approximated uh method okay so the difference between these two is ordinar Le squares tends to you occupy much larger memory and uh it is computationally very intensive ordinary squ but it is able to arrive at a much better solution after the optimization it is able to arrive at a much better fitting the most suitable uh the best optimal fitting line the best fit line on your data after it has done all its computations so computationally it is quite intensive it is uh time complexity is quite high for ordinary Le spes okay whereas for online gradient descent the time complexity is much lower okay it's simpler uh but however it is iterative so and also it is not as computationally intensive as the ordinary Le squares the online grent okay one more difference is that ordinary Le squares will uh tend to process the entire data set all in one go whereas online gent is send we'll take it uh step by step take it sample by sample okay uh so sometimes because online gradient descent is an iterative OP optimization procedure sometimes it may fail to converge properly okay it may get stuck in local Optimum sometimes online may fail to converge properly so uh so that's why since there are these two solution methods for performing linear regression for finding the best fit line we will explore both of these and compare these two models so that's why on the left hand side for linear regression yeah on the on the left hand side for linear regression choose ordinary squares the one on your right okay for this choose online gradient decent okay let's take quick look at now how to configure the parameters now the if you choose when you choose ordinary Le squares there is something called as L2 regularization weight what is this any idea on what is uh what is this parameter this hyper parameter here L2 regularization see uh what this means is uh this is a type of regularization to prevent overfitting okay so we perform regularization to prevent overfitting and so what happens is what what we do is there are two types of regularization L1 and L2 okay um uh in L2 regularization which is called as R Reg uh we tend to bring down the values of all the regression coefficients to nearly zero but not exactly zero okay in there is another form of regularization that is called as lasso regression lasso regularization in the case of lasso regularization we actually bring down the values of several of the regression coefficients the weights to zero so therefore lasso regression the lasso uh regularization tends to act as a feature selection process by mean by uh because it tends to weed out several of the features so it tends to readed out several of the features so it tends to uh set several uh coefficients several regression coefficients to zero thereby eliminating those features and so therefore last regularization can be used when you have very high dimensional data okay and it can be sort of a feature selection Tech te whereby it can you can get a much smaller subset of the data as a result of the ler regularization but in our case we don't have so many features so in our case ordinary lease SES might perform better okay in our case because ORD what ordinary lease SES does is it employ I mean it uh it does L2 which is reg regression in the case of reg regression the uh regression coefficients are not pinned down to zero exactly but near zero that means all n of the features are weed out all the features remain exactly as they are but the coefficients are typically brought down to a small value okay uh when I say the features I mean the independent variables so in in regression you call these features as the independent variables and then the output or the target of the price in this case the price column or the price variable is our dependent variable right so this is the effect of what we call as L2 regularization or R regression okay so Professor L2 regularization is a rich equation L1 is the Lesser equation right yeah yeah yeah now now one thing is what is this L2 regularization weight okay so by uh you know there is this multiplicative term of this L2 regularization weight times the regression coefficient okay times the uh L2 Norm of the regression coefficient this is the multiplicative thing so if we if we penalize that means uh well not like that so let me explain if we uh if we blow the value if you blow up the value of this regularization weight if would make it much larger okay significantly larger what it is going to do is because the multiplication is still going to be the same if I increase say if I have a * B and if I increase a too much then I'll have to decrease B proportionately to maintain the same value of the multiplication that is exactly what happens here so this L to regularization weight often often times it is turned as Lambda Lambda time W the norm of w okay so Lambda time the norm of w what happens is if we select this Lambda value to be very high then automatically it forces the weights to be zero to be near zero to maintain the same values for multiplication so that is what that is when what we call as imposing penalty on the weights so if you have very large if you set a very large regularization weight value it is equivalent to setting a very high penalty on the weights consequently or or on the other hand okay not consequently on the other hand if we set the value of the L2 regularization weight to be very less then that may blow up the weights that may result in the regression coefficients attaining larger values so uh if we um so that tends to balance out between the goodness of fit and the model complexity okay so here uh initially you know you will see that your L2 regularization weight the default value that Azure gives us is 01 have you seen that you might be getting 01 as a default value here yes Professor right so I have increased it to almost 200 times or 250 times so multiply that by 250 what you get 01 * 250 is how much uh no not 250 uh 01 * 25 maybe right so anyway it's a huge I basically magnified this L2 regularization weight and kept to a very large value so as to what the net effect of this is that it is going to actually what is going to do is and I've done it intentionally because what this is going to do is it is going to forcefully pin down the values or or uh lower the values of all the weights or the regularization coefficients make them smaller to near zero okay and that will have uh probably we expected good consequences in terms of the mod convergence okay so now and also please remember to set the sorry Professor I was just uh so when you uh when we actually run this linear regression at the back there will be there will be a multi-layer perception layer uh Network kind of uh thing and the weights uh when you say the weights means the neurons each no no no no there is no no no no no no no no there is no MLP at the back end no it's just linear regression when we run this it is going to create a linear regression model say uh for let's say if our Target variable is price mathematically if our Target variable price is denoted by y then Y is equal to beta 1 x beta 0 x0 plus beta 1 X1 plus beta 2 X2 and so on and this is linear there are no nonlinear terms there is nothing called as X1 X2 X1 * X2 or X1 squ or X1 X2 squ no nonlinear terms are there everything is linear like a polinomial linear polinomial okay so uh there will be no layer of neurons at the back no why it is going to Simply build a linear regression model by Computing these uh regression coefficients it is going to start to compute the regression coefficients and then attach them to the features attach them to the independent variables that we have selected right those 25 set of independent variables are the features that we have selected each one is going to have a regression coefficient at to it no in terms of its relative importance that the model thinks or the model uh decides that the model will decide on which of the features are significantly more important or are causing significant changes to the dependent variable which of the features are are not as significant in terms of impacting the dependent variable price which of those independent variables are significantly impacting the price depending on that the regression coefficients will be computed okay Professor so then you will get that entire model y bet model y yeah s sorry Professor I just to confirm so this uh weight you are mentioning in here refers to the beta eyes of our model right is that correct no no no no no no no no no no no no no no no no no this okay I I will show you one second I'll show you just I can't show you this way okay here are you able to see my notepad I I'll share the notepad uh no Professor I can see your HML uh page site yeah uh it should become visible in a moment are you able to see it now yes yep okay yes yeah yeah yeah so so here uh so here see the loss function is so whatever value of the loss function you have Okay um basically it's a combination of two factors uh the first is the norm of uh you know which is y i s uh Yi minus let's put it like this y i IUS y i okay squ plus uh Lambda times the L2 Norm of the weights okay and these weights so these are your regression coefficients so your loss function basically when you are regularizing during regularization what happens is to the your main objective function of the loss function there is an additional term that is introduced this regularization term this is called as the regularization term now what happens is uh the regression coefficients I was talking about where beta 0 beta 1 these are all your betas okay so these are all your regression coefficients or you can sometimes think of them as weights so now these are your regression coefficients this is the L2 Norm of these regression coefficients so now what happens is this Lambda is what is this this L2 regularization L2 regularization weight is given by this Lambda okay this this quantity called as Lambda now to keep this if I increase Lambda by by a large amount that's what what I was saying earlier is if I increase this Lambda by a very large amount it will penalize these betas and make them very small likewise if I reduce the value of Lambda to very small value okay if I that means I'm not all penalizing the betas then what will happen is these beta values will blow up but what we want in during regularization is we want to keep as many as less number of features as possible to remove the model complexity so we want to paralize these betas by making these betas very small or near to zero so that's why essentially we have to use large lambdas if you use large values of Lambda there is a tradeoff you'll have to see using it very large two large values of Lambda will not cut it because then everything will become zero right so uh we have to know how to uh change alter the value of Lambda and then such that it effectively penalizes the betas and so because of doing that it can reduce overfitting of the model okay that is what this uh L2 regularization weight is L2 regularization weight is represented by this Lambda this Lambda parameter here is25 0.25 okay otherwise if you don't have regularization then your loss function this is your resulting loss function something like a you know mean squared eror or this is something like a sum of squar errors but to it we we add generally add a regularization term which is what I just showed you Lambda time uh the El Norm of the weights of the regression coefficients okay okay thank you thank you yes one so now okay also please don't uh yeah also please add this include intercept term because for linear regation you want to add an intercept term the Y intercept for the straight line so then also make sure you also include the random number seat so setting some random number seat of your choice okay put in some random number seed of your choice okay now once you do that okay please run this component run selected just run it don't take much time now we will explore on the other hand because we are comparing ordinary least least squares or performing linear regression using ordinary Le squares versus online gradient descent right because we are performing we are comparing these two uh ways we are we are comparing these two types okay so what we want to do is here we want to uh click on this other linear regression and this is where we are going to employ the iterative procedure or the iterative optimization procedure called as online gradient descent which is a kind of an approximated way which is a kind of approximation uh of the actual analytical solution given by ol okay so uh here again because it's an online gradient descent we we'll have to set the learning rate right we set the learning rate to 0.1 initially leave it at 0.1 uh or even if you want to slow down what is the learning rate by the way uh how what is the learning rate when we talk about radient descent it is the speed by uh how we reach to the this one umum to the global minimum that's right right so the learning rate essentially tells us or or it is a way to effectively reach to take steps of your weight updates so it is the basically learning rate control your weight updates it controls it controls the magnitude of your weight updates larger learning rates will increase the amount of amount by which your weight gets updated by weight I mean what what weights do I mean those regressions uh those uh the parameters right so larger learning rate will uh result in larger weight updates smaller learning rate will result in smaller weight updates and consequently smaller gradients larger learning rates will uh resulting larger uh in larger taking larger steps towards the global Optimum towards the global minimum here because this is optimization in when you talk about gradient descent gradient descent is an optimization iterative optimization procedure in optimization we can either do minimization or maximization but here we are trying to perform minimization minimization of what minimization of the global error minimization of the global error so it's going take uh so we need to configure the learning rate learning rate is usually in the in the gradient descent formula in the gradient descent weight update formula learning rate is typically either you know as a symbol is given by alpha or ITA okay so typ and normally it is uh you know U we can start with a larger value of a learning rate but as over the uh EPO over successive EPO we need to decrease the learning rate so we need to check this box because otherwise what will happen is if you don't decrease learning rate when after several leops it may start oscillating and it may because of oscillations it may overshoot the global minimum and by overshooting the global minimum it may land elsewhere and the algorithm will never converge so we need to decrease the learning rate and click on decrease learning rate okay also uh the number of training EPO so here the number of training you can put in something like maybe 100 or so okay the what is an Epoch an Epoch is when all your data in your data set has been processed has been the model has seen all the samples in your data set is one Epoch so we'll run it for 100 number of epochs and then this is the same what is this this is the L2 regularization weight that you have the same right so here also we'll keep it to 05 okay or not 0. five I think here we have kept it to what value 025 0.25 25 so here also we'll put it as here also we put it as25 0.25 now put in some random number seat okay these are not that important is going to also additionally normalize the features and average the final hypothesis so put in some random number seed here okay and then once that is done okay once that is done please run this R selected uh Professor one quick question like normally uh we have this stopping criteria also so here we don't need to mention it or it is like is is it something already stopping criteria stopping criteria is a number of epo okay okay so it is just uh not comparing that uh error and minimal that calculation is not happening here no no it is it is no no no no it it is uh it is doing all that online credent descent will do all that it will compare the error and then it will update the weights until the error becomes nearly zero if the once the global minimum is reached once the global error is uh Global error uh the global minimum error is reached then what is the meaning of that that means there is no error anymore error becomes completely zero yeah but that is when the that is when the algorithm converges then there are no subsequent weight updates yeah right Professor so I was just saying that in the Practical situation that error doesn't never gets zero so that's why I was just curious to know like if any spefic yeah okay the the other way to converge is if your error is not really zero hovering near zero okay in Practical cases it may be hovering near zero or maybe some value near zero then you can just select a number of training EPO make it larger maybe like thousand and see uh whether that same error is being maintained and that that will account for its convergence okay okay Professor so we have to do some um trials trials and error in the real world yeah and to see how many number of epo whether there any changes you will see after a certain number of epo if there are no further changes to the error right then the algorithm has converged okay yeah got it Professor yeah thank you and that is happening because it has may not it may not have found a global minimum it may have found some local Optimum near the global Optima yeah and that is why it is happening there still some slight residual error okay okay profor yeah thank you okay so let's do it for thousand I don't know how long it will take but I do it for th000 uh or you can even do it for 100 you know if you want to keep the computation simple do it for th000 up to you okay and uh okay that's set please run this component okay that with that done this is the state us to perform hyper parameter tuning of the model okay so uh what we want to do is here is uh click on go to machine learning uh yeah go to machine learning uh yeah please go to machine learning and and then uh from there go to uh train okay please click on the train pane here and then you'll see the tune model hyper parameters tune model hyper parameters uh component please drag that here okay now you see here uh let's explore now make this connections now the tune model hyper parameters takes three inputs if you see here the tune model hyper parameters component takes three inputs one is it takes the ml algorithm or the model the second is it takes the training data set okay so we are feeding in the training data set from the split data here and then this is from the second split data and then uh so this is as I told you earlier if you remember this becomes our new training data set this is our validation data set and this in the first plate data this becomes our test data set so now data is entire data is split into three parts training validation test because when we perform hyper parameter tuning we cannot directly give it the test data set we cannot make the test data set available to the model that can only be made available to the model during the model scoring during model testing only okay so with that intention we'll have to create another separate data set which we call the validation data set in order to validate the model in order to find out the best performing model so the so intention of Performing model hyper parameter tuning is to find a configuration of hyper parameters that in effect produces the best performing model the most optimally performing model okay so please feed in these connections make sure you feed in these connections correctly and here uh you'll see uh for the tune model hyper parameters uh we'll have to specify the parameter sweeping mode okay it could be either one of three the entire grid there's something called as entire grid let's discuss about this okay so in the tune model paramet uh there is something called as entire grid now entire grid means uh the entire hyper parameter space okay for example if your hyper parameter uh is the number of aox okay uh then uh for um or if your hyper parameter is the El to regularization weight okay then it is going to uh it's going to select the uh uh you know it's going to select the entire hyper parameter space belonging to that specific parameter the hyper parameter so it's going to select the entire range and it is going to select each and every conceivable value of that parameter and search exhaustively through that entire search hyper parameter search space in order to find combinations where it is going to find the best performing model where it is going to deliver the best performing model entire searching the entire GD exhaustive so it's computationally also quite expensive to search the entire grid and it is also computationally time consuming okay now that said there is something called as random grid okay random grid is where uh certain grids of values of the hyper parameters are already there inside okay they're randomly sampled from the entire grid and those uh randomly sampled values of hyper parameters are there and what it does is it is going to sample some values now it is going to sample some values from those grid from those random grids randomly you can randomly sample from those grids and what that will account that will account for a lot less computational time and a lot less computational complexity because now because now it is only going to search a very small space space of the entire grid okay so it'll be very efficient that way and there's one more called as random spe where it is going to select entire ranges of the hyper parameters and uh from those ranges it is going to randomly sample a few hyper parameter values and then just run for those hyper parameter values which are randomly sampled from that entire range okay so these are the three types so uh I I don't recommend entire grid because that may take a very long time we can use random grid now when we use random grid to randomly sample a few values for the hyper parameters and you see there is something called as maximum number of runs on the random grid right so you select that as five here now what that will do is it is going to select the hyper parameter different hyper parameter values each time and run it and run it uh and and uh train the model using the training data and validate the model and and test the model using the validation data each time okay so each time it is going to run the model train the model using uh one of these five different values of the random grid okay each time it is going to use one value of the a different value of the random of the of the hyper parameter uh use a different value of the hyper parameter and uh train the model and then validate the model each time so it will train and test train and test using the validation data set five times by using five different values of the hyper parameter and then it will report which is the best one so which is the best performing model and that model is what we are going to that that model is what is going to appear at the trained best model uh node here of the tune model hyper parameters the best performing model will is what will appear here under train best model okay also please input the random SE so if there are any randomizations it is going to take care uh make it reproducible now we'll have to it's going to ask us to select the label column here the label column is price so launch the column selector so please launch the column selector and uh select price you can select by name and then make sure you select the label column okay and then uh the this is what is important what what whatever you give here is not important because this for classification we are solving a regression problem so the metric for measuring performance for regression here just say root of mean Square okay R root mean Square okay do this now uh at this point in time please save your word okay save whatever it is that you have entered you save it this is with the save icon with the help of the save icon at the bottom part the bottom uh section of your screen now once you save it then okay then now uh you have configured this and you can right click and run this run selected it will take a while okay because it is going to run five different runs it's going to train and test the model five different times times with five different hyper parameters combinations five different values of the hyper parameter so uh okay I think it is fairly quick see random grd is fairly quick okay if you do entire grade it may take much longer you're looking at 3 minutes 5 minutes 10 minutes even this completed a couple of seconds and now let us do the same thing here on the other hand here where we using the online gradiant descent let us select tune model hyper parameters here and uh select roughly the the same thing here so uh actually yeah since we took five then we also take five here okay or one thing let us do one thing for the tune model hyper parameters let us make it 10 here okay and I'll rerun this so the maximum number of runs on the random grid is 10 it's going run it 10 times each time with a different value of the hyper parameter and uh and so also do the same thing for this tune model hyper parameters component but this also change the maximum number of runs to 10 put in some random seat and select the label column's price okay last select the label columns do the same thing and then the metric for measuring regression performance select as root mean root of mean squ r it's called as typically called I mean popularly called as RC root mean square square root of the mean of the square okay and then run this U Professor one question like uh we have to add the train model and then do we do need to do that tune model no no no no no no no no no no you need to add the tune model hyper parameters first first you need to T hyper parameters right only then you know which is the best optimal model okay actually I'm not getting the then you will use that most then only you will be able to receive the best train model here and use the best train model to train your model here then only you'll be able to get the uh most optimal set of hyper parameters with which to train your model here think I haven't run this I'll run it but please make sure the maximum number of runs on the random grid here okay the maximum number of runs in the random grid is set to 10 for both for both the T model hyper parameters okay both are run now see uh if you click on the first one okay here uh you will see be able to see so visualize and from here you'll be able to see the results of those runs okay so these are your sweep results okay and is reporting the best Sweep result here okay and likewise here also now when you click on this this will show you uh here so when you click on this the the second node of the tune model hyper parameters okay it you'll be able to visualize all the regression coefficients so all the various regression coefficients have been determined here okay against each feature you will see the regression coefficient now sum of a negative sign if you notice sum of a negative sign it means an inverse relationship with with the of this independent variable of this feature against the dependent variable so if this increases the other one decreases that is determined by the negative sign okay so now um okay once it is done now we proceed and onward to train our model okay so please uh introduce the train model here and uh here uh so you will find that under machine learning under train you will see train model here so drag the train model from here okay now to train the model you will feed in the best train model the train bestas model this here feed the model Al and then you need to fit in the the training data set as I told you earlier the training data set is going to come from where this is our training data set this is the validation data set and this is the test data set so you need to fit in the training data set from here into the train model component okay okay now once that is done just right click and say run selected that will start raining your model it will take a while likewise also introduce a train model component on the right I'll show you in a minute also introduce the train model component on the right okay when you do that so here uh on the right again introduce the strain model component and you're going to train it very similarly using the training data here this is the training data set coming from the training data set here and then this is the input to the from the tune model hyper parameters which is the train best model is to be fed into the train model here now before you run train model you will have to launch the column selector and produce the label column and give it the label column you have to assign the name of the column which is price here okay so if price is here just drag it just drag price to the right hand side and say confirm your selection and then train them on and likewise here also do the same thing and then run the component so I haven't run the tune model hyper parameters here that is also going to run okay so uh yeah once that runs okay now we are going to now our training is complete now we are going to test our model here so now it's time to test the model so introduce the score model and go to score okay go to machine learning score and from here uh drag the score model component here to the right hand side feed in the train model here and the and the test data from here this is the test data from the first split data component on the one on the right hand side is the test data feed that test data into the score model here and run it okay and likewise also do the same thing here introduce another score model here feed in the train model on the left hand side and the and the test data from the first plate data feed in the test data to the score model here okay this is a test data from the split data feed in the test data to the score model here and then run it okay and finally now it is time to evaluate those two models so click on uh evaluate model drag the evaluate model component and you'll find it under machine learning evaluate so under machine learning you'll find it under evaluate here so you'll see evaluate model here drag evaluate model onto this space here okay and uh right click and uh run selected okay so now you see the evaluate model has completed execution so just right click on it and visualize visualize this so this will through this we'll be able to visualize the performance of the regression models of the two different types of regression models one on the left is the one that is showing on the left is from our uh from the uh let me see make one change here so you look at the results here you see the one on the left is significantly better the coefficient of determination what is the coefficient determination by the way as applied to linear regression it gives us a sense of the goodness of fit of the model okay it also explains uh it also uh tells us that uh here 82.6% of the variance in the dependent variable is being explained by the independent variables so the remaining 18% is unexplained by other factors that are not there in the model okay so uh this essentially is a goodness of tells us the goodness of fit of the model so it is really quite quite good in the sense it's 82.6 one but uh compared to just 38.7 from the online gradient descent so you can see significant difference between the two okay and probably if you take these two values and compute a t test or you know yeah comput a pair T Test uh probably uh it will turn out to be statistically significant as well okay so there's a significant difference between these two results so how do you account for the difference between these two results now the one that is the one on the left that is coming from the ordinary Le squares and the one on the right uh which is performing quite poorly is coming from the online grent descent so that is because online gradient descent is an iterative optimization procedure which is an approximation whereas this is a close form solution this actually represents a close form solution so uh uh close solution is something that is actually computed okay it is mathematically computed and by taking some inverses and some transforms and so that tends to uh perform significantly better than an approximate method okay now what are these different metrics here mean absolute error right what are these different metrics like mean absolute error so mean absolute error is basically you absolute error when you compare for example two sales forecasting models okay uh to determine which one has lower average error in the in the predicted sales now when it comes to msse mean squared error mean squared error is sorry when it comes to the root mean squared error root mean squ error is nothing but the square root of the mean of the squared eror square root of MC is R MC uh this uh uh when MC what MC does me squ does is it tends to amplify the errors it tends to amplify the residuals the errors because of the squaring term and so what root mean Square does is it de amplifies those residuales again and brings it back to the original units of the features okay so it's something quite useful because it gives it is in the same units as those other features and then there's something called as relative absolute d so relative absolute error allows for you know like a straightforward comparison between different models by normalizing the errors relative to a baseline model so the important thing here is in uh in case in the case of relative absolute error it uh uh it allows for an easy comparison between different models between multiple models by normalizing the errors relative to a baseline model so uh you know uh cases where you can use relative absolute error for comparing say models say um in the case of house prices for example you you're comparing predictive models for house prices to determine which will perform better relative to the mean predictor so basically essentially you're trying to understand which will perform better relative to your mean predictor then you use relative Absolut right and then relative Square dat is also similar but this is more more often than not relative squ error is used for comparing your financial models to understand how their performance is relative to the Natural fluctuation in the for example the stock prices so in stock markets Etc you have fluctuations in the stock prices that's where relative uhor comes into the picture that's a very can use squ yeah yeah please tell me so um I am comparing um so I followed followed exactly uh what you did here so parameters are exactly the same except I only changed um my random seed compared to yours um so I expected some slight differences in in uh the results but the difference I'm getting so I used a random seed of one 123 you use a random seed of 12 3 4 in in some of yours so my coefficient of determination is98 my mean absolute error is 781 which is significantly smaller than yours and my root mean squared error so it looks like the output of um my linear regression model is significantly more accurate than yours with just the difference in now obviously not considering overfitting and things like that because didn't try that but is that is that expected from just a change in the random seed to to pull different values to train you're random not only that you even if you use the same random seed there is no uh there's no guarantee that it's going to produce the same set of random numbers in the same sequence the random numbers May Vary even the sequence in which it is produced may vary and professor for me the results are actually inverse so the model on the right hand side is actually performing better the model that's okay that's that's a case where it's not supposed to happen like that but uh yeah I just took the random SE has a different value so but generally what happens okay um yeah the model on right hand side the the so the coefficient of determination is 79 okay and whereas the model on left left hand side it's 27 so it is kind of quite quite different and the model on performing very see usually what usually what happens is you use OLS OLS tends to give you much better results usually that is usually the case comp me because ultimately yeah I'm wondering why maybe there is a change in the parameters or something that you may have entered just the random the random seed I change right just the random seed so I took different Val values for different but yeah it is just that I have changed the rest I followed what you have done Professor so Professor could you explain what is a thumb of rule in in choosing the random seed because it does gives a different output and everyone perhaps may have a different output because it's random Rand randomly chosen so you mean to say uh so you're referring to the to the what now the random C what is the thumb of rule here why I mean if the results are different that's expected but if the results are way different that it can even reverse the situation then then what is the THB of rule we should follow in selecting the Rand no that is there is no thumb rule per se there is no thumb rule whatsoever that exists in selecting a randoms but there are only some guidelines and best practices so those guidelines and best practices I already told you in the sense that you know if you want to make your results reproducible you'll have to set a fixed random seat and not change it each time you run then that that is from the point of view reproducibility these are guidelines varability random seed for all the parameters we selected has to be same or consistent no they don't have to be no no no there is no such rule that says they have to be same they I mean within the same how you can if you if you if you want you can put use the same random seed everywhere but that is not going to ensure the again uh and then you can also if you want you can use different random seats also I guess for reproducibility um it's just that you don't change the random seed that you've selected between runs that's all is that correct that's all yeah yes that's correct so you not you're not to change the random seat when you if you want model reproducibility across the different run of your experiment or your model you should not change the random seat keep it to a fix F but there is no thumb rule per se for selecting a random seat yeah hi professor I just want to clarify the coefficient of determination is it the same as R square as covered by Professor Papu are they the same because usel yeah they the squ yeah is it is it the same they're the same all right good great another question is that uh do you have the P values for the corresponding uh beta coefficients I remember you actually show us the weight column which is our P do you see do the Ang ml come up with a P value no there are no P values that you can see the only what you can see here is uh the train model you'll only be able to see visualize the coefficients the regression coefficients the regression weights okay so in that sense how do we Okay so in that sense how do we know these weights whether they are significant because I remember during uh the lessons with Professor Papu he actually mentioning that uh from the P value we could actually identify those betas coefficients are significant or not significant So based on these uh two how we can beta values that are significant you are yeah you're probably referring to the F test the F score test right the F test that is used to understand not no no I'm not referring to F test F test is to test the validity of the model right I'm actually talking about the P values which is the significance of the uh parameters which is the betas because the F test is talk about significance of the model right it's overall significance of the So currently my question is yes correct correct correct but currently I'm actually talking about the P value for each of the respective beta coefficients is that a way for that is something that that is something that okay you could probably if you want to you can do it additionally in Python and you can get the P values to see if their statistically significant get the P Val to see sure here is not reporting them sure then how about the P value not reporting them sure then how about P value of the model significance also not reported by aure ml is it the uh the F test no no that is also not important only thing here what you see the errors these regression errors along with the R squ value yeah but if you do it in Python you'll get a large number of these uh tests get getting reported especially if you use OLS in regression in Python you will see many other test like jar there's a test called as jar to test the J there is a test have normality are following normality they normally distributed things like that so things like confidence intervals the F test all of these test are reported in Python if you performs regression but they are not reported here in Azure sure they not reported here sure understand that but are we covering python in this uh program not are we covering python in thisr you uh I'm not sure you have to askon my guess but P values were used to remove the insignificant rows right in sign significant parameters so we assume that is the action that remove duplicate rows or like that does it internally use P value inside just guessing removing duplicate RS no okay row and and columns are different right columns are your features which is where you look for your P values but the rows are the data points that we were referring to so two they are two different uh Concepts understand yeah understand but since uh this does lot of things automatically I'm just assuming somewhere it is using it internally but which that function I don't know is there something like normalized no normalized data is also there is no mention of any value whatsoever even in the correct yeah that's a disadvantage of using a sort of tool like this where everything's done for you versus using the python where you might have more control over what you get out of it so it's it's a balancing act between what you want to get out of it and which tool you use yep that's right yeah that's Professor by any chance it gives the best model based on the P values that it calculates at the background so there is no documented instance of whether or not it is actually uh calculating any P vales of the back end and there is no such teach avable and prior to our um lectures with Dr Papu um this would have been sufficient for us what uh Dr Banji is showing us here it's just because we had those lectures and now we know about key values and the importance of them and significant and insignificant columns were looking for it in this aure to key values are to test for statistical significance between any two values if you if you to test for if you to test for for example for T tests so if you you know for these for hypothesis testing especially use P values to see if the uh if you can reject the N hypothesis against the alternative hypothesis if the less than certain yeah sorry what I was going to say is Professor P Papu had showed us to use the XEL um to generate um the calculations for the P values so if we use we just need to extend what we're doing here right so if we use the Excel we take the data set put it into Excel and and calculate the regression um results as Dr Papu showed us and we can see which columns are significant and which are not significant then all we need to do here in the Azure as Dr Vani showed is go back up to that um that select columns and select Only The Columns that we think based on the Excel and the P values show us as as significant um and so we can select that and then generate the regression model in Azure based on that so it's just a two-step process now instead of the one step with the one two I I would suggest that if you really want to look at the P values just just I just asked chat GPD about this it said that in Azure ml Studio the focus is more on predictive modeling and operationalization of machine learning pipeline rather than statistical inference so that's why P value is not used here it's just it says yeah that's true because you don't see as much of you know the statistical Packages Etc correct correct yeah more looking at operationalizing that rather than explaining statistically yeah yeah why that's happening yeah [Music] interesting okay then so we'll stop here if you have any more questions please let me know or you can email me uh I share my email with you here on the chat window and uh that's my email so Professor are you also going to have a hands on on uh logistic regression uh I'll check if that is planned but so this course is towards uh your foundations of and ml is that correct yes it's about foundations of a and ml yes okay I have to check the labs list of labs okay there is one will certainly do it one with logistic regression logistic regation is more of it's a it's a classification where loation is where you threshold the linear regression values and make it into a classification problem rather than a regression problem so yeah if need be we can take up logistic regression okay uh we'll have to see uh the actual plan lab plan thank you also keep in mind that whatever what the problem you have solved today is multiple linear regression okay because uh it's not a linear regression in one independent variable it's a linear regression in multiple independent okay just wanted to add that oh I see so one quick question even as other colleagues also mentioned that based on the different randomy we are also getting a different um output of the accuracies and kind of scars even for mine for me also it is the same thing it is 98% uh coefficients um so I just wanted to understand if that is the case if we get the same set of data and uh with the different seats we are getting multiple sets of out what is the decision point we should take should be uh for example if it is a kind of real business problem should we go to the uh you know higher authority and telling that with this number of seats we are able to so one one reasonable way and a very one very easy way to solve it is there are 10 people working on it just take an average just take a mean of all those results and report it I think that that's the way repetitively do multiple experiments and take the mean up so we do yeah right then so we will stop here and um but when that depend sorry Professor just just one more thing but wouldn't it depend on the result that you're looking for so if you're if you have 10 people and you're looking to get the most accurate model um and one out of those 10 people for example get 098 accuracy and the other person gets 89 if you're looking for the most accurate why would you take wouldn't you just take the most accurate as opposed to the average or is there some risk to doing that so uh or overfitting risk might be there overfitting risk might be there but you're using the same data right so how true because of the random uh so so yeah that the random is inced here so again if you run later point in time different randomization may happen when you change the machine ifun a different virtual machine that may happen the random number generator now for the same trank outand yeah I think the quer was correct sometimes you keep again a threshold select only this experiments which has created me more than say 8585 R square or that one and reject the rest but that's true that randomization may create a different you know results and still can have an overfitting there no because of the randomizations they you end up getting a different result even for the same random scene if you change the the system timer of the clock you know it sometimes it also uses the system timer of the clock to generate these random numbers the SE of random numbers uh so the random numbers May themselves change for the same seat can happen if you change if you run it on a different machine or a different server that can happen yeah that's right so instead go for a more aggregated uh sort of approach to see more or less these are the hyper parameters and this is the way this is the way we tune the hyper parameters then uh this is uh sort of this is the value not the exact value but this is the neighborhood of the value that we can get in we can focus on the neighborhood of the values of the r squ values okay thank you Professor thank you very much so we'll stop here then today and uh thank you very much all of you for your time and thank you for joining and so uh we meet again in the next live session in the subsequent live session okay thank you all of you uh please don't forget to put in your feedback um thank thank you very much we will get this as a recorded session right sir in the yeah yeah yes yes portal yeah thank you Avail thank you Professor thank you thank you thank you all thank you everyone I'll stop sharing the screen now and uh okay e