Transcript for:
Understanding ROC and AUC Concepts

wait till you see the roc and the a you see they're cool yeah stack quest hello i'm josh starmer and welcome to statquest today we're going to talk about roc and auc and they're going to be clearly explained note this stat quest builds on the confusion matrix and sensitivity and specificity stat quests so if you're not already down with those check out the quests also the example i give in this stat quest is based on logistic regression so even though roc and auc apply to more than just logistic regression make sure you understand those basics let's start with some data the y-axis has two categories obese and not obese the blue dots represent obese mice and the red dots represent mice that are not obese along the x-axis we have weight this mouse is not obese even though it weighs a lot it must be mighty mouse and just full of muscles this mouse doesn't weigh that much but it is still considered obese for its size now let's fit a logistic regression curve to the data when we're doing logistic regression the y-axis is converted to the probability that a mouse is obese now let's just look at the curve if someone told us that they had a heavy mouse that weighs this much then the curve would tell us that there is a high probability that the mouse is obese if someone told us that they had a light mouse that weighs this much then the curve would tell us that there is a low probability that the mouse is obese so this logistic regression tells us the probability that a mouse is obese based on its weight however if we want to classify the mice as obese or not obese then we need a way to turn probabilities into classifications one way to classify mice is to set a threshold at 0.5 and classify all mice with the probability of being obese greater than 0.5 as obese and classify all mice with the probability of being obese less than or equal to 0.5 as not obese using 0.5 as the cut off we would call this mouse obese and this mouse not obese if another mouse weighed this much then we would classify it as obese and if another mouse weighed this much then we would classify it as not obese to evaluate the effectiveness of this logistic regression with the classification threshold set to 0.5 we can test it with mice that we know are obese and not obese here are the weights of four new mice that we know are not obese and here are the weights of four new mice that we know are obese we know that this mouse is not obese and the logistic regression with the classification threshold set to 0.5 correctly classifies it as not obese this mouse is also correctly classified but this mouse is incorrectly classified we know that it is obese but it is classified as not obese the next mouse is correctly classified but this mouse is incorrectly classified the last three mice are correctly classified now we create a confusion matrix to summarize the classifications these three samples were correctly classified as obese and this sample was predicted to be obese but was not obese these three samples were correctly classified as not obese and this sample was predicted to be not obese even though it was obese once the confusion matrix is filled in we can calculate sensitivity and specificity to evaluate this logistic regression when 0.5 is the threshold for obesity little bam because so far this is all review now let's talk about what happens when we use a different threshold for deciding if a sample is obese or not for example if it was super important to correctly classify every ob sample we could set the threshold to 0.1 this would result in correct classifications for all four obese mice but it would also increase the number of false positives the lower threshold would also reduce the number of false negatives because all of the obese mice were correctly classified note if the idea of using a threshold other than 0.5 is blowing your mind imagine that instead of classifying samples as obese or not obese we were classifying samples as infected with ebola and not infected with ebola in this case it's absolutely essential to correctly classify every sample infected with ebola in order to minimize the risk of an outbreak and that means lowering the threshold even if that results in more false positives on the other hand we could set the threshold to 0.9 in this case we would correctly classify the same number of ob samples as when the threshold was set to 0.5 but we wouldn't have any false positives and we would correctly classify one more sample that was not obese and have the same number of false negatives as before with this data the higher threshold does a better job classifying samples as obese or not obese but the threshold could be set to anything between 0 and 1. how do we determine which threshold is the best for starters we don't need to test every single option for example these thresholds result in the exact same confusion matrix but even if we made one confusion matrix for each threshold that mattered it would result in a confusingly large number of confusion matrices so instead of being overwhelmed with confusion matrices receiver operator characteristic roc graphs provide a simple way to summarize all of the information the y-axis shows the true positive rate which is the same thing as sensitivity the true positive rate is the true positives divided by the sum of the true positives and the false negatives in this example the true positives are the samples that were correctly classified as obese and the false negatives are the ob samples that were incorrectly classified as not obese the true positive rate tells you what proportion of ob samples were correctly classified the x-axis shows the false positive rate which is the same thing as one minus specificity the false positive rate is the false positives divided by the sum of the false positives and true negatives the false positives are the non-ob samples that were incorrectly classified as obese and the true negatives are the samples correctly classified as not obese the false positive rate tells you the proportion of not ob samples that were incorrectly classified and are false positives to get a better sense of how the roc works let's draw one from start to finish using our example data we'll start by using a threshold that classifies all of the samples as obese and that gives us this confusion matrix first let's calculate the true positive rate there are four true positives and there were zero false negatives doing the math gives us one the true positive rate when the threshold is so low that every single sample is classified as obese is one this means that every single ob sample was correctly classified now let's calculate the false positive rate there were four false positives in the confusion matrix and there were zero true negatives doing the math gives us one the false positive rate when the threshold is so low that every single sample is classified as obese is also one this means that every single sample that was not obese was incorrectly classified as obese now plot a point at one comma one a point at one comma one means that even though we correctly classified all of the ob samples we incorrectly classified all of the samples that were not obese this green diagonal line shows where the true positive rate equals the false positive rate any point on this line means that the proportion of correctly classified ob samples is the same as the proportion of incorrectly classified samples that are not obese going back to the logistic regression let's increase the threshold so that all but the lightest sample are called obese the new threshold gives us this confusion matrix we then calculate the true positive rate and the false positive rate and plot a point at 0.75 comma 1. since the new point is to the left of the dotted green line we know that the proportion of correctly classified samples that were obese is greater than the proportion of samples that were incorrectly classified as obese in other words the new threshold for deciding if a sample is obese or not is better than the first one now let's increase the threshold so that all but the two lightest samples are called obese the new threshold gives us this confusion matrix we then calculate the true positive rate and the false positive rate and plot a point at 0.5 comma 1. the new point is even further to the left of the dotted green line showing that the new threshold further decreases the proportion of samples that were incorrectly classified as obese in other words the new threshold is the best one so far now we increase the threshold again create a confusion matrix calculate the true positive rate and the false positive rate and plot the point now we increase the threshold again create a confusion matrix calculate the true positive rate and the false positive rate and plot the point the threshold represented by the new point correctly classifies 75 percent of the ob samples and 100 percent of the samples that were not obese in other words this threshold resulted in no false positives now we increase the threshold again and plot the point now we increase the threshold again and plot the point lastly we choose a threshold that classifies all of the samples as not obese and plot the point the point at zero comma zero represents a threshold that results in zero true positives and zero false positives if we want we can connect the dots and that gives us an roc graph the roc graph summarizes all of the confusion matrices that each threshold produced without having to sort through the confusion matrices i can tell that this threshold is better than this threshold and depending on how many false positives i'm willing to accept the optimal threshold is either this one or this one bam now that we know what an roc graph is let's talk about the area under the curve or auc the auc is 0.9 bam the auc makes it easy to compare one roc curve to another the auc for the red roc curve is greater than the auc for the blue roc curve suggesting that the red curve is better so if the red roc curve represented logistic regression and the blue roc curve represented a random forest you would use the logistic regression double bam now one last thing before we're all done although roc graphs are drawn using true positive rays and false positive rates to summarize confusion matrices there are other metrics that attempt to do the same thing for example people often replace the false positive rate with precision precision is the true positives divided by the sum of the true positives and false positives precision is the proportion of positive results that were correctly classified if there were lots of samples that were not obese relative to the number of ob samples then precision might be more useful than the false positive rate this is because precision does not include the number of true negatives in its calculation and is not affected by the imbalance in practice this sort of imbalance occurs when studying a rare disease in this case the study will contain many more people without the disease than with the disease bam in summary roc curves make it easy to identify the best threshold for making a decision this threshold is better than this one and the auc can help you decide which categorization method is better the red method is better than the blue method hooray we've made it to the end of another exciting stat quest if you like this stat quest and want to see more please subscribe and if you want to support statquest well consider getting a t-shirt or a hoodie or buying one or two of my original songs the links for doing that are below alright until next time quest on