Transcript for:
Machine Learning with Python: Feature Scaling

hello Python programmers so this is the Builder ninth of a machine learning with Python video series and in this video I'm going to show you that how you can do feature scaling for your data set so before doing anything let's understand what the feature scaling actually is in simple words feature scaling is converting all the numerical data which are present in wide range into same scale or same range there you can see in this data set only we are having multiple columns like this mpg displacement horsepower weight acceleration but they all have different range of values like the displacement is varying from 300 to let's say 450 and the weight is varying from thirty four thirty to forty three something so we can see that we are having a wide range of numerical values present in our data set so in feature scaling we try to convert all this numerical data to vary between minus 1 to 1 so that all the columns have same range of numerical values so we just understood that what is feature scaling now let's understand why do we need feature scaling there are two probable reasons first is most of the algorithm uses Euclidean distance now let me show you what york lenient distance is so this is euclidean formula and you must have seen something like this in your high school right so this is how many of the machine learning algorithm calculate the distance between two points and then you can see we are doing the square of the subtraction between two values and if there is a huge difference between the numbers which we are having in a data set there you can see that this value is in thousands and this value is in dents okay so we are having huge difference in value this will create very uncertainty in this formula and our machine learning model will not be as accurate as we want so this is why in the machine learning models using the Euclidean formula requires feature scaling to be done for their data set now let's come to the second point what is the need for feature scaling for those models which do not use the Euclidean formula so if the data if the numerical data range is small like -1 to 1 then the time required to train the model is also less and and we can save our time for training our data okay so this was the theoretical part of what feature scaling is now let's get to a notebook and actually see how we can do feature scaling now I'm going to use the same notebook which we have used for our test and train split so if you haven't watched the Destin train split video I'll provide the link in the description as well as in the I button you can watch that first and then we can continue here okay [Music] okay so now we are into our notebook our first task will be to import the libraries you okay so when I was explaining you the theory part I forgot to mention the formula used for scaling there are two possible formulas that you can use for scaling first is standardization and second is normalization this is the formula of standardization now I'm not focusing a lot on what this formula actually is because we just need our work done and Python is taking care of everything there you can see that we are having a package which will automatically calculate the normalized formulas for us so we don't need to understand this formula but there are two methods existing so I just introduced you to both of them if you want to understand this then this is the formula you can apply your mind him so let's get back to our notebook now I haven't run this notebook so let me run this you okay so now let's create this under scalar class object now we'll apply this standard scaler to our exchange data then we will fit that scalar value into our exchange number array okay so now let's apply this to our X test and we'll just transform our X test numpy array because we have already transformed our chain data okay so now let's see how our test and train data are looking so first let's see X chain and there you can see that the values in each column are varying from 1 to minus 1 and let's see our X test it just a second let's see our X test data X X test so there you can see in our test at all so the value is varying from minus 1 to 1 ok so now you must be wondering that of I have returned only the feature scaling on the independent variable or the exit why haven't we done the feature scaling on the dependent variable or the Y set this is because the result or the vise set was having the categorical data now if we take a look at our independent variable Y then we can see that it it is having just two values here either 0 which is not by or 1 which is by so we don't need to make any changes to this because this is our output and it will spoil our model so this is why we haven't done feature scaling to our Y set okay so this was the final video for data pre-processing and we have done all the boring work and only the juicy part is left which is the machine learning algorithms in the next video we'll talk about what are supervised and unsupervised machine learning algorithms and from that only we will start our new machine learning journey so this is me wrapping up this and I'll meet you in my next lecture bye-bye