Understanding Logistic Regression Basics

let's talk about logistic regression which is probably the single most widely used classification algorithm in the world this is something that I use all the time in my work let's continue with the example of classifying whether a tumor is malignant whereas before we're going to use the label one or yes the positive Clause to represent malignant tumors and zero or no negative examples to represent benign tumors here's a graph of the data set where the horizontal axis is the tumor size and the vertical axis takes on only values of zero and one because it's a classification problem you saw in the last video that linear regression is not a good algorithm for this problem in contrast what logistic regression we end up doing is fit a curve that looks like this a sort of s-shaped curve to this data set and so for this example if a patient comes in with a tumor of this size which I'm showing on the x-axis then the algorithm will output 0.7 suggesting that it's closer or maybe more likely to be malignant than benign we'll say more later what 0.7 actually means in this context but the output label Y is never 0.7 is only ever 0 or 1. to build up to the logistic regression algorithm there's an important mathematical function I'd like to describe which is called the sigmoid function sometimes also referred to as the logistic function the sigmoid function looks like this notice that the x-axis of the grass on the left and right are different in the graph to the left on the x-axis is the tumor size so it's all positive numbers whereas in the graph on the right you have zero down here and the horizontal axis takes on both negative and positive values and of label the horizontal axis Z and I'm showing here just a range of negative 3 to Plus 3. so the sigmoid function outputs values between 0 and 1 and if I use G of Z to denote this function then the formula of G of Z is equal to 1 over 1 plus e to the negative Z where here e is a mathematical constant that takes on a value of about 2.7 and so e to the negative Z is that mathematical constant to the power of negative Z notice if Z were really big say a hundred e to the negative Z is e to the negative 100 which is a tiny tiny number so this ends up being 1 over 1 plus a tiny little number and so the denominator will be basically very very close to one which is why when Z is large G of Z that is the sigmoid function of Z is going to be very close to one and conversely you can also check for yourself that when Z is a very large negative number then G of Z becomes 1 over a giant number which is why G of Z is very close to zero so that's why the segment function has this shape where it starts very close to zero and slowly builds up or grows to the value of one also in the sigmoid function when Z is equal to zero then e to the negative Z is e to the negative zero which is equal to one and so G of Z is equal to 1 over 1 plus 1 which is 0.5 so that's why it passes the vertical axis at 0.5 now let's use this to build up to the logistic regression algorithm we're going to do this in two steps in the first step I hope you remember that a straight line function like a linear regression function can be defined as w dot product of X plus b so let's store this value in a variable which I'm going to call Z and this will turn out to be the same Z as the one you saw on the previous slide but we'll get to that in a minute The Next Step then is to take this value of Z and pass it to the sigmoid function also called the logistic function G so now G of Z then outputs a value computed by this Formula 1 over 1 plus e to the negative Z that's going to be between 0 and 1. when you take these two equations and put them together they then give you the logistic regression model f of x which is equal to G of W X plus b or equivalently G of Z which is equal to this formula over here so this is the logistic regression model and what it does is it inputs a feature or set of features X and it outputs a number between 0 and 1. next let's take a look at how to interpret the output of logistic regression we'll return to the tumor classification example the way I'd encourage you to think of logistic regressions output is to think of it as outputting the probability that the cost or the label y will be equal to 1 given a certain input X so for example in this application where X is the tumor size and Y is either 0 1 if you have a patient come in and she has a tumor of a certain size X and if based on this input X the model outputs 0.7 then what that means is that the model is predicting or the model thinks there's a 70 chance that the true label y will be equal to 1 for this patient in other words the model is telling us that it thinks the patient has a 70 chance of the tumor turning out to be malignant now let me ask you a question see if you can get this right we know that y has to be either zero or one so if y has a 70 chance of being one what is the chance that it is zero so why has got to be either 0 1 and thus the probability of it being 0 or 1. these two numbers have to add up to one or to a hundred percent chance so that's why if the chance of Y being 1 is 0.7 or 70 chance then the chance of it being zero has got to be 0.3 or 30 chance if someday you read research papers or blog posts about logistic regression sometimes you see this notation that f of x is equal to P of y equals one given the input features X and with parameters W and B what the semicolon here is used to denote is just that W and B are parameters that affect this computation of what is the probability of Y being equal to 1 given the input feature X for the purpose of this Clause don't worry too much about what this vertical line and what the semicolon mean you don't need to remember or follow any of this mathematical notation for this Clause I'll mentioning this only because you may see this in other places in the optional lab that follows this video you also get to see how the sigmoid function is implemented in code you can see a plot that uses the sigmoid function so as to do better on the classification tasks that you saw in the previous optional lab remember that the code will be provided to you so you just have to run it I hope you take a look and get familiar with the code so congrats on getting here you now know what is the logistic regression model as was the mathematical formula that defines logistic regression for a long time a lot of Internet advertising was actually driven by basically a slight variation of logistic regression this was very lucrative for some large companies and this is basically the algorithm that decided what ad was shown to you and many others on some large websites now there's even more to learn about this algorithm in the next video we'll take a look at the details of logistic regression we'll look at some visualizations and also examine something called the decision boundary this would give you a few different ways to map the numbers that this model outputs such as 0.3 or 0.7 or 0.65 to a prediction of whether Y is actually 0 or 1. so let's go on to the next video to learn more about logistic regression

Transcript for:Understanding Logistic Regression Basics

Transcript for:
Understanding Logistic Regression Basics