Inference with Generalized Linear Models

for inference with generalized linear models you already cover to the use of a world test to test the significance of a single regression parameter and you covered the deviance test statistic to compare nested models in the possible regression model setting now you will go beyond this setting and also cover the use of drop and deviance tests with more general Germans so here are the preliminaries in a generalized linear model the response or outcome y:i of policy holder I will be estimated or predicted with an estimate for its expected value that is mu hat for observation I in this case now consider two extreme positions in a on one hand in the so-called null model each response gets the same estimate or prediction that's the global average you denote it with Y bar and this is of course an overlying simple model which only uses one parameter and is therefore too simple to model or to explain your data on the other hand it's in the saturated or full model each response is estimated by its actual values so in a sample of size and this approach with estimates and parameters so the model replicates the data instead of explaining that the model clearly over fits the data and picks up not only the trends but also the noise in the data set well the so-called deviance statistic is then a likelihood ratio test statistic that balances the regression model and kept under consideration against the full or the saturated model and to see its general specification let us start again from the lot likelihood of a generalized linear model in the general notation developed in this deck of sheets using the relation between the parameter theta and the mean mu I of response or observation I in the general framework you will replace the theta I with B prime inverse evaluated in mu I this is the substitution allows you to write down the log likelihood of the gel n in terms of its main product W mu I this is the so called me parameter ization of your generalized linear model now for a specific regression model under consideration write the corresponding load likelihood as L evaluated in mute hat and you want to compare this attained log likelihood for the model under investigation against the world likelihood of a full model where each observation is replicated thus the mu hat equals the wine and the observation itself for observation I now the likelihood rates your test statistic of interest compares the log likelihood of the regression model under consideration against the Lord likelihood of the full model that's the maximum value of the likelihood that can be achieved with a model that is over fitting your data you can see here at the top of the sheet how this likelihood ratio test statistic can be written so note that we write down this statistic as a random variable and for that reason we're using the relevant variables so the Capital y to denote the responses as random variable so what you do here is you take minus 2 times the logarithm of the likelihood of the model under investigation divided by the likelihood of the saturated model that's in fact what you see here with the difference in blue rent now for the function a of the dispersion parameter fine we often use the following expression so that's fine divided by an observation specific weight W so the expression derived as such is the so-called scaled deviance statistic it is the deviance divided by the dispersion parameter fine that's what you get here if you construct this like you do it with the statistic that compares the model under investigation against the food model so do note that in the setting of Poisson regression the expressions for deviance and scale deviants are the same they're equal because the dispersion parameter in our Poisson regression example would be fixed it would be equal to 1 however as you can see here in a general setting the dispersion parameter is a node and therefore it has to be estimated and the scaled deviance reflects this now before moving on with discussing other useful statistics and inferential tools based on these you will derive here the deviance and scale deviance test statistics for one particular example of Java's linear model in Gaza the normal linear regression model so the log likelihood as a function of the mean mu I of observation I and variance parameter Sigma squared is printed top of the sheet now recall that mu I the mean of observation I is expressed as a vector X I transpose times a vector beta with the regression parameters does this GLM uses the identity link function and a dispersion parameter here equals the variance parameter Sigma squared that's what we also derived earlier on in in the course the scale deviants skilled deviant statistic then becomes the sum of the squared differences between the response variable Y and the esteem me and Muir had I and we scaled it with the dispersion parameter Sigma squared if useful a weight W I could also be included in this expression so this is what you get it's a set of squares and for the sum of squared residuals divided by Sigma squared that's what you get here for the scaled deviants that statistic if you work out this formula now the deviance test statistic in this example then simply corresponds with the well-known sum of squares so it's the sum of the squared residuals or the sum of the squared differences between the responses and the fitted values again an observation specific weight WI can be included here which would then lead to a so called weighted sum of squares so to conclude this example if we write down scaled deviance and deviance test statistic for the normal linear regression setting we get something very familiar and we recognize you the residual sum of squares now next to these two the deviance and skilled deviance test statistic another useful statistic for inference in the GLM setting is the pearson statistic denoted here with the chi-square and this statistic is defined as the sum of the squared differences between the response variable by ions meaning as estimated by the regression model under consideration so we take the squared difference and we divided by the variance function evaluated in mu hat spine I recall that earlier in this course we specified this variance function V evaluated pinon you have Y as it's the part of the variance that it then expresses that captures how the variance is changing with the mean how the variance depends so that's what you see appearing over here now putting this to practice let's consider once again the example of the normal linear regressions the variance in this particular regression model does not depend on me it's a fixed variance indeed this type of regression implies the assumption of homoscedasticity R for constant variance denoted here with Sigma squared the Pearson statistic reduces them to the sum of squared residuals and thus equals the expression that we just derived for the deviance test statistic in the case of normal linear regression both the Pearson statistic and your deviance statistic coincide right so it turns out that asymptotically one can rely on interesting distributional results for both the deviance and Pearson statistic these distribution of results are relevant to derive inferential tools for Jalen's later on in this video for the normal linear regression you may recall the distribution of the sum of squared residuals and the sum of squares is distributed as Sigma squared the variance times Chi square distribution with n minus P plus 1 degrees of freedom and here n is the number of observations in the sample and P plus 1 is the number of regression parameters which are estimated so that's an intercept plus P regression parameters to capture the effects of covariance in a linear predictor this result generalizes in the Jhelum setting as follows so that is both the deviance as well as the Pearson's statistic have an asymptotic distribution that equals the dispersion parameter Phi of the gel n times a hi square distribution with degrees of freedom n minus P plus 1 so that's useful result now you're more you're now ready to derive the so-called drop in deviance test as a formal tool for inference with generalized linear models and this test will compare nest we're the model in the null hypothesis is a simplification of the model in the alternative hypothesis which you obtained by putting Q regression parameters equal to zero and a drop in deviance the statistic is then defined as as false so it is the difference and therefore the drop between the scale deviance from the reduced model so that's the model under the null hypothesis the difference between the scale deviance from the reduced model and the scale deviance from the larger model the model that you get in the alternative hypothesis you can also see this as the difference between the sum of squared residuals from the reduced model under null and the sum of squared residuals from from the baking model under the alternative hypothesis appropriate three scales so the last bullet II reveals in fact an interesting similarity with the extra sum of squares F test that we know from normal linear regression however in a German context a new type of residual is used in this case the deviance residual the deviance of a regression model under investigation equals to some of its squared difference residuals and that's where this definition of the deviance residual is coming more on this later in the video but you will now leave elip the drop in deviance test statistic as an example of a likelihood ratio test statistic so denote with l h0 the log likelihood has maximized under the null hypothesis and let l h1 be the log likelihood as obtained under the alternative the likelihood ratio test statistic which compares the likelihoods under the null hypothesis and the likelihood under the alternative can then be written as you see here on the and in this notation we use s and LS to refer to the saturated or full bubble and it's corresponding likelihoods so what you recognize here is the scaled deviance of the model under the null hypothesis - the scale deviance of the model under the alternative hypothesis and that's why we refer to this the statistic as a drop in deviance test statistic so the drop in deviance that's the test statistic that will be used to decide whether you will continue with the reduced model from the null or with the larger model from the alternative hypothesis and how are you going to take this decision well intuitively a small realized value of the dropping deviance test statistic suggests that the likelihoods realized I didn't know as well as under the alternative are approximately equal and that implies that the reduced model specified in the null hypothesis does explain the data that is good as the larger model from the alternative does right but a realized value of this dropping deviance that is March would imply that the likelihood has maximized under the null is way larger than the likelihood that is realized under the alternative hypothesis thus the reduced model is inadequate in explaining the data so you will formally evaluate whether the observed drop in deviance is small or large by comparing its value with a quanta from a high square distribution with Q degrees of freedom note that this result only holds when the dispersion parameter Phi is known and it does not require estimation this is the case for instance with a posto regression where this Phi would be equal to 1 but what would you do in case the dispersion parameter Phi is then you will have to estimate it and a possible estimator is the following so use the deviant statistic obtained with the larger model under the alternative hypothesis and scale it by dividing by its degrees of freedom in the model with P regression parameters and an intercept and a sample size and this degrees of freedom would be equal to n minus P plus 1 but when fie needs to be estimated you will work with the following test statistic you will take the ratio off on the one hand to drop in deviance divide it by Q and on the other hand the estimator for the dispersion parameter Phi and note that the QP R is the difference in the number of parameters between the regression model under the alternative and the regression model under the null hypothesis so it's typically it is the number of parameters that you put equal to 0 in York PI put in your null hypothesis now to decide whether the null hypothesis should be rejected or not you will compare the observed value of this F test statistic with an F distribution with Q and n minus P plus 1 degrees of freedom that is if the observed value of your statistic is larger then a high quantile from the F distribution then the null hypothesis cannot be accepted the parameters considered in the null hypothesis should not be put to 0 and the larger model is to be preferred do note the analogy between this test and the partial F test which we know from normal linear regression models now a final item that we wish to cover in our discussion of GLM's is the notion of types of residuals specifically devine defined for generalized linear models so recall that in regression models residuals are typically explored to assess the adequacy of the fit of a model in terms of is the variance assumption adequate is the link function that is used is that a suitable choice is that a good fitting choice article variants included in the linear predictive are they included there in an appropriate way do we need to take a transformation of a certain covariant before plugging it in in the linear predictor and so on moreover inspection of residuals may also reveal the presence of unusual observations so it is clear that residuals matter in regression models but for journals and extend the definition of residuals is required one that is applicable to all distributions from the exponential family and a first type of residuals used in this context is the so called Pearson residual and these residuals use the difference between the response fitted value the new heads but scale it with the square root of the variance function evaluated in this fitted value such the raw residuals for appropriately scaled and a particular feature of those Pearson residuals is that if you take the sum of the squared Pearson residuals you get the Pearson statistic as defined earlier on in this video and a similar observation holds for the deviance residuals so the deviance residuals they are defined in such a way that the sum of the squared deviations equals the deviance of the model under investigation so that's how you should picture those deviance residuals a detailed derivation of the expression for the deviance residuals in the case of Masson regression is given on this sheet and this relies on the expressions derived earlier on in this video both Pearson as well as deviance residuals can be extracted easily in our software after calibrating a GL so that concludes the video we covered some inferential tools for a decision making with generalized linear models we discussed different types of residuals which are which do matter in the framework of generalized linear models and we will continue now with a demonstration of fitting a generalized linear model with

for inference with generalized linear models you already cover to the use of a world test to test the significance of a single regression parameter and you covered the deviance test statistic to compare nested models in the possible regression model setting now you will go beyond this setting and also cover the use of drop and deviance tests with more general Germans so here are the preliminaries in a generalized linear model the response or outcome y:i of policy holder I will be estimated or predicted with an estimate for its expected value that is mu hat for observation I in this case now consider two extreme positions in a on one hand in the so-called null model each response gets the same estimate or prediction that&#39;s the global average you denote it with Y bar and this is of course an overlying simple model which only uses one parameter and is therefore too simple to model or to explain your data on the other hand it&#39;s in the saturated or full model each response is estimated by its actual values so in a sample of size and this approach with estimates and parameters so the model replicates the data instead of explaining that the model clearly over fits the data and picks up not only the trends but also the noise in the data set well the so-called deviance statistic is then a likelihood ratio test statistic that balances the regression model and kept under consideration against the full or the saturated model and to see its general specification let us start again from the lot likelihood of a generalized linear model in the general notation developed in this deck of sheets using the relation between the parameter theta and the mean mu I of response or observation I in the general framework you will replace the theta I with B prime inverse evaluated in mu I this is the substitution allows you to write down the log likelihood of the gel n in terms of its main product W mu I this is the so called me parameter ization of your generalized linear model now for a specific regression model under consideration write the corresponding load likelihood as L evaluated in mute hat and you want to compare this attained log likelihood for the model under investigation against the world likelihood of a full model where each observation is replicated thus the mu hat equals the wine and the observation itself for observation I now the likelihood rates your test statistic of interest compares the log likelihood of the regression model under consideration against the Lord likelihood of the full model that&#39;s the maximum value of the likelihood that can be achieved with a model that is over fitting your data you can see here at the top of the sheet how this likelihood ratio test statistic can be written so note that we write down this statistic as a random variable and for that reason we&#39;re using the relevant variables so the Capital y to denote the responses as random variable so what you do here is you take minus 2 times the logarithm of the likelihood of the model under investigation divided by the likelihood of the saturated model that&#39;s in fact what you see here with the difference in blue rent now for the function a of the dispersion parameter fine we often use the following expression so that&#39;s fine divided by an observation specific weight W so the expression derived as such is the so-called scaled deviance statistic it is the deviance divided by the dispersion parameter fine that&#39;s what you get here if you construct this like you do it with the statistic that compares the model under investigation against the food model so do note that in the setting of Poisson regression the expressions for deviance and scale deviants are the same they&#39;re equal because the dispersion parameter in our Poisson regression example would be fixed it would be equal to 1 however as you can see here in a general setting the dispersion parameter is a node and therefore it has to be estimated and the scaled deviance reflects this now before moving on with discussing other useful statistics and inferential tools based on these you will derive here the deviance and scale deviance test statistics for one particular example of Java&#39;s linear model in Gaza the normal linear regression model so the log likelihood as a function of the mean mu I of observation I and variance parameter Sigma squared is printed top of the sheet now recall that mu I the mean of observation I is expressed as a vector X I transpose times a vector beta with the regression parameters does this GLM uses the identity link function and a dispersion parameter here equals the variance parameter Sigma squared that&#39;s what we also derived earlier on in in the course the scale deviants skilled deviant statistic then becomes the sum of the squared differences between the response variable Y and the esteem me and Muir had I and we scaled it with the dispersion parameter Sigma squared if useful a weight W I could also be included in this expression so this is what you get it&#39;s a set of squares and for the sum of squared residuals divided by Sigma squared that&#39;s what you get here for the scaled deviants that statistic if you work out this formula now the deviance test statistic in this example then simply corresponds with the well-known sum of squares so it&#39;s the sum of the squared residuals or the sum of the squared differences between the responses and the fitted values again an observation specific weight WI can be included here which would then lead to a so called weighted sum of squares so to conclude this example if we write down scaled deviance and deviance test statistic for the normal linear regression setting we get something very familiar and we recognize you the residual sum of squares now next to these two the deviance and skilled deviance test statistic another useful statistic for inference in the GLM setting is the pearson statistic denoted here with the chi-square and this statistic is defined as the sum of the squared differences between the response variable by ions meaning as estimated by the regression model under consideration so we take the squared difference and we divided by the variance function evaluated in mu hat spine I recall that earlier in this course we specified this variance function V evaluated pinon you have Y as it&#39;s the part of the variance that it then expresses that captures how the variance is changing with the mean how the variance depends so that&#39;s what you see appearing over here now putting this to practice let&#39;s consider once again the example of the normal linear regressions the variance in this particular regression model does not depend on me it&#39;s a fixed variance indeed this type of regression implies the assumption of homoscedasticity R for constant variance denoted here with Sigma squared the Pearson statistic reduces them to the sum of squared residuals and thus equals the expression that we just derived for the deviance test statistic in the case of normal linear regression both the Pearson statistic and your deviance statistic coincide right so it turns out that asymptotically one can rely on interesting distributional results for both the deviance and Pearson statistic these distribution of results are relevant to derive inferential tools for Jalen&#39;s later on in this video for the normal linear regression you may recall the distribution of the sum of squared residuals and the sum of squares is distributed as Sigma squared the variance times Chi square distribution with n minus P plus 1 degrees of freedom and here n is the number of observations in the sample and P plus 1 is the number of regression parameters which are estimated so that&#39;s an intercept plus P regression parameters to capture the effects of covariance in a linear predictor this result generalizes in the Jhelum setting as follows so that is both the deviance as well as the Pearson&#39;s statistic have an asymptotic distribution that equals the dispersion parameter Phi of the gel n times a hi square distribution with degrees of freedom n minus P plus 1 so that&#39;s useful result now you&#39;re more you&#39;re now ready to derive the so-called drop in deviance test as a formal tool for inference with generalized linear models and this test will compare nest we&#39;re the model in the null hypothesis is a simplification of the model in the alternative hypothesis which you obtained by putting Q regression parameters equal to zero and a drop in deviance the statistic is then defined as as false so it is the difference and therefore the drop between the scale deviance from the reduced model so that&#39;s the model under the null hypothesis the difference between the scale deviance from the reduced model and the scale deviance from the larger model the model that you get in the alternative hypothesis you can also see this as the difference between the sum of squared residuals from the reduced model under null and the sum of squared residuals from from the baking model under the alternative hypothesis appropriate three scales so the last bullet II reveals in fact an interesting similarity with the extra sum of squares F test that we know from normal linear regression however in a German context a new type of residual is used in this case the deviance residual the deviance of a regression model under investigation equals to some of its squared difference residuals and that&#39;s where this definition of the deviance residual is coming more on this later in the video but you will now leave elip the drop in deviance test statistic as an example of a likelihood ratio test statistic so denote with l h0 the log likelihood has maximized under the null hypothesis and let l h1 be the log likelihood as obtained under the alternative the likelihood ratio test statistic which compares the likelihoods under the null hypothesis and the likelihood under the alternative can then be written as you see here on the and in this notation we use s and LS to refer to the saturated or full bubble and it&#39;s corresponding likelihoods so what you recognize here is the scaled deviance of the model under the null hypothesis - the scale deviance of the model under the alternative hypothesis and that&#39;s why we refer to this the statistic as a drop in deviance test statistic so the drop in deviance that&#39;s the test statistic that will be used to decide whether you will continue with the reduced model from the null or with the larger model from the alternative hypothesis and how are you going to take this decision well intuitively a small realized value of the dropping deviance test statistic suggests that the likelihoods realized I didn&#39;t know as well as under the alternative are approximately equal and that implies that the reduced model specified in the null hypothesis does explain the data that is good as the larger model from the alternative does right but a realized value of this dropping deviance that is March would imply that the likelihood has maximized under the null is way larger than the likelihood that is realized under the alternative hypothesis thus the reduced model is inadequate in explaining the data so you will formally evaluate whether the observed drop in deviance is small or large by comparing its value with a quanta from a high square distribution with Q degrees of freedom note that this result only holds when the dispersion parameter Phi is known and it does not require estimation this is the case for instance with a posto regression where this Phi would be equal to 1 but what would you do in case the dispersion parameter Phi is then you will have to estimate it and a possible estimator is the following so use the deviant statistic obtained with the larger model under the alternative hypothesis and scale it by dividing by its degrees of freedom in the model with P regression parameters and an intercept and a sample size and this degrees of freedom would be equal to n minus P plus 1 but when fie needs to be estimated you will work with the following test statistic you will take the ratio off on the one hand to drop in deviance divide it by Q and on the other hand the estimator for the dispersion parameter Phi and note that the QP R is the difference in the number of parameters between the regression model under the alternative and the regression model under the null hypothesis so it&#39;s typically it is the number of parameters that you put equal to 0 in York PI put in your null hypothesis now to decide whether the null hypothesis should be rejected or not you will compare the observed value of this F test statistic with an F distribution with Q and n minus P plus 1 degrees of freedom that is if the observed value of your statistic is larger then a high quantile from the F distribution then the null hypothesis cannot be accepted the parameters considered in the null hypothesis should not be put to 0 and the larger model is to be preferred do note the analogy between this test and the partial F test which we know from normal linear regression models now a final item that we wish to cover in our discussion of GLM&#39;s is the notion of types of residuals specifically devine defined for generalized linear models so recall that in regression models residuals are typically explored to assess the adequacy of the fit of a model in terms of is the variance assumption adequate is the link function that is used is that a suitable choice is that a good fitting choice article variants included in the linear predictive are they included there in an appropriate way do we need to take a transformation of a certain covariant before plugging it in in the linear predictor and so on moreover inspection of residuals may also reveal the presence of unusual observations so it is clear that residuals matter in regression models but for journals and extend the definition of residuals is required one that is applicable to all distributions from the exponential family and a first type of residuals used in this context is the so called Pearson residual and these residuals use the difference between the response fitted value the new heads but scale it with the square root of the variance function evaluated in this fitted value such the raw residuals for appropriately scaled and a particular feature of those Pearson residuals is that if you take the sum of the squared Pearson residuals you get the Pearson statistic as defined earlier on in this video and a similar observation holds for the deviance residuals so the deviance residuals they are defined in such a way that the sum of the squared deviations equals the deviance of the model under investigation so that&#39;s how you should picture those deviance residuals a detailed derivation of the expression for the deviance residuals in the case of Masson regression is given on this sheet and this relies on the expressions derived earlier on in this video both Pearson as well as deviance residuals can be extracted easily in our software after calibrating a GL so that concludes the video we covered some inferential tools for a decision making with generalized linear models we discussed different types of residuals which are which do matter in the framework of generalized linear models and we will continue now with a demonstration of fitting a generalized linear model with

Transcript for:Inference with Generalized Linear Models

Transcript for:
Inference with Generalized Linear Models