Transcript for:
Understanding Inferential Tests in Statistics

Hello everyone, for this video we'll be talking about different inferential tests, the basic ones that you can use in order to test for differences. So in this video we'll talk about how we understand and imagine relationships between variables and statistics and then we'll try to look into what are the things that we consider in order for us to choose a specific inferential statistics to test a specific relationship whether it exists or not and then we'll talk about different tests between tests for difference between two groups and then tests for difference among three or more groups and then we'll talk about tests for difference when we have repeated measures. So when we talk about relationships between and among variables and statistics, we actually are looking for co-variations between the variables. Like for example, is there a pattern that links a variable A with variable B? Do they behave in a fashion that would make us think that they have a similar pattern? that they could probably be moving the same way or moving in a specific direction. For instance, when one value increases, does the other value of another variable decrease? And we can imagine this in different types of variables like for example when we think of let's see Family belongingness and then religiosity and you measure them as continuous variables in a scale. You would see how across the different respondents. Is there a pattern in the way the values behave? And will the increase in family belongingness be somewhat related to or will it co-variate with the increase with religiosity? So basically, that is what we're looking for in testing relationships. We want to see if there is a pattern between the movements of the variable from one person to another in the given sample. And we can also imagine co-variation or relationship. when there is a categorical variable. Like for example, do we see higher values of family belongingness or higher values of religiosity among females or among males? The only difference is that when your other variable or when your first variable is a categorical variable and the other variable is a continuous variable, usually we describe that relationship in the context of difference. Difference meaning, is there a specific difference? difference in the way the continuous variable behaves when you group the sample based on a certain level of the category or another level of the category? So in this case, the question would be, would there be a significant difference in the context of in the way the values behave within family belongingness when you group the students or when you group the sample according to sex at birth? So females versus males. Now there is a plethora of statistical analysis that we can make use in order to make an inference or to make an inference in relation to the patterns of relationships that you have between and among variables, kaya nga inferential statistics. And there are certain questions that we need to ask and answer in order for us to be able to say na ah, I'm going to use a t-test or an ANOVA or a SPIR... man or a man with new or I'm going to use a regression or Pearson etc. So the first question we can ask is how many dependent variable or outcome variables do we have in this specific relationship that we're testing? Now at least in our case, in our practice, we will be dealing with relationships that only have one dependent or one outcome variable. Because the ones with more than one outcome variable, mas mataas. yung level of statistics na gagamitin natin. Okay, but that's the first question. If there's only one outcome variable in that relational model that we want to test, then there's a specific set of statistics that you can use. And then if it's more than one, then there's a specific set of tests that you can use. Like for example, a singular outcome, you can use an ANOVA. more than two outcomes, you can use a MANOVA. But we will not be discussing MANOVA in this class. We'll just be discussing the ANOVA. And then next is how are the dependent variable or outcome variables measured? So again, in a relationship, you have at least two variables. And then one of those variables is an outcome variable. And of course, the first variable is an independent variable or a predictor. And then usually we have to ask very important. is what is the nature or what is the level of measurement of the outcome variable so basically there are two answers here no is it a categorical variable or is it a continuous variable so that's the uh so that's the question so if it's a if it's a continuous variable then you have a plethora of statistics that is already like that's a set of statistics that may be your options based on your answer for the next questions so it can be ANOVA, T-Test, Pearson R, Spearman Rho, etc. And then you have when it's when it's a categorical outcome then the options that you have will be littler. You can just go for a chi-square or a logistic regression but those tests will not be covering that in our class. Because most of the time in social statistics we usually our outcome variables are usually in the form of a continuous variable just like the variables that you are working on in your sample values. Next question is how many independent variables or predictor variables do you have in your study? Now when you have more than one independent variables when it's more complex ibig sabihin you have more than two variables that you're looking at one is an outcome and then you have more than one independent variables then that's a little bit more of a higher level of statistics that you will need you will need a multivariate regret a multivariate test no but if it's only just one independent variable vis-a-vis one dependent variable then that's bivariate so that's one uh one classification of an inferential statistics if you are only dealing with two variables at a time then that's bivariate but if in that test you can deal with more than two variables at a time then that's a multivariate now ideally when you are already kind of of intermediate when you're doing your statistics the more ideal is to go towards multivariate tests because it's just one test and you can already make multiple inferences out of the behavior out of the patterns of the values in your variables you know compared to a bivariate but bivariate is a little bit more fundamental and that's the first thing that you have to learn first and that's what going to be most of our inferential statistics will be in this class and the next also very important is the nature of the independent variables so like i said we'll be working mostly with continuous outcome variables but for independent variables most of the time we'll be working with with different types of independent variables. Some independent variables will be categorical with two categories, so dichotomous, or three or more categories, so that's multivariate. sorry, multinomial. So, iba. And then what if your independent variable is also a continuous variable? So, there's a specific test that you will run based on the nature of the independent variable and based on the nature of the dependent variable. so next question is if the independent variable is categorical no so again but it can be continuous or categorical if the cat if it's categorical are they only two groups because when it's only two groups then um there is a specific set of statistics that you can use for two groups independent variable and then there's also a specific set of statistics that you will use for three or more categories and then next is if the iv predictor is categorical um Are they matched or same or different participants used in each category? Because for example, we have what we call repeated measures. When you say repeated measures, you measure the same variable in two different points in time. Or at least two different points or two or more points in time. Like for example, if... I'm going to measure, let's say, depression prior to the pandemic, and then I go back to the same participants and then measure their depression during the quarantine, and then I measure their depression after the quarantine, and that's called repeated measures. So the categories are not really categorizing people, but they are actually categories of the time to which they are measured, but we are talking about the same participants. And then there, of course, there are those that are not matched or not same. Like, for example, we are measuring depression scores in a single period in time, but we are differentiating the group of males and females, or the group of first year, second year, third year, fourth year. That's different because the first one that I'm talking about, for example, we're going to differentiate... the measures of depression in pre, during, and post-lockdown versus differentiating depression among males and females or between males and females. The first one is what we call a within group. A within group. analysis meaning this is your participant so you're going to be measuring in pre-lockdown during lockdown after lockdown right But you also have the other one, which is a between group. Meaning, for example, you have your depression scale and then you have males versus females. So two different groups. But the other one is the same group but different measures. Okay, so. And then finally, does the data meet assumptions for parametric tests? So we taught you earlier whether it's, is it normally distributed? Is there issues with skewness or kurtosis? Because after you answer all of these questions and you've already, you will usually be ending up with two options. The first option is a parametric option. And the second option is a non-parametric option. So for example, let's just do this. For example, if my dependent outcome variables, I only have one. So let's say, before I continue, let me just clean my board first. Okay, so allow me to continue. Alright, so first, let's say I'm going to measure the relationship between, let's say, depression. Depression. And then versus, let's say, depression scores. And then we're going to see if it's related to, let's say, academic performance. Academic performance is measured via grades. Grades as in numerical. 80 90 not the uno tres cuatro in lasal but like the 80 90 okay so how many dependent variables it's only one dependent variable all right and then how are the dependent uh variables measure or how is the dev dependent variable measure so as i said it's 81 90 95 so it's equidistant no and so it's yeah so it is a continuous variable Next, how many IV or predictor variables? So as I said in this hypothesis, we only have one, which is depression score. So that's one. And then how are the IV predictor variables measured? So depression, since it's a score, it's measured as a scale. So it's also continuous. Okay. So we will no longer answer IV predictor is categorical because we already established it's continuous. And this one we will also not answer because the answer here is continuous. And then next question is, do we meet assumptions for parametric test? Now, if your independent variable and dependent variable are only just both one, and they're both continuous, you're left with two options to test for the relationship. The first option is a parametric test. parametric one, and the parametric one, and we will learn this next week, is the Pearson-R correlation. And then if it's a non-parametric, ibig sabihin hindi niya na-meet yung assumptions of normality, then we use Spearman-Raw. Okay? So that is what we will do. Okay? So that's how you kind of use this guideline in order for you to be able to find out which tests you're going to use. So now let's go to the next one. go for the tests for difference now for the tests for difference usually our independent variable is categorical okay um and now basically the issue now here is how many categories do you have two categories or do you have three or more categories? So let's start with when you have two categories and your dependent variable is continuous, what you would do? So what is an example of a research question or hypothesis that would usually warrant this type, yung two categories? For example, what is the relationship between gender, which is measured as male versus female, and life satisfaction? Life satisfaction, which is a continuous variable. Or if we articulate that as a hypothesis like what you did in your previous homework, is there a significant difference in life satisfaction when grouped according to gender? And the gender has two groups. So when the independent variable is dichotomous, meaning dichotomous, there are two variables, and the dependent variable is continuous, and we have met the parametric requirements, requirements means. meaning normally distributed yung data, usually we use an independent test. Sorry, an addition, in terms of identifying whether there's normal distribution, some more conservative researchers, even if the data is normally distributed, usually they still go for a non-parametric because the sampling was not collected via... random sampling no so hindi nag random sampling so usually for them hindi na talaga yun ng parametric but there are more uh more progressive or more liberal statisticians who say na as long as you meet the assumptions of normality whatever your sampling technique was okay na tayo no so uh what are the different values that you need to check in your statistical output when you do your t-test no so you have to check first again no if the p value is less than 0.05. So if the value is less than 0.05 as we have discussed in our previous discussion, that's the time we reject the null hypothesis. So that's the first thing that you check. So later, I will demonstrate to you how to do it via GEMOV so that you will no longer be using this very cumbersome formula. So I will not teach you about the formula. I will teach you in terms of how to generate a Vjamovie test using a t-test and then how do you interpret the tables in the t-test. So the first value that you need to check is the p-value. If the p-value is 0.05, then you have to reject the null hypothesis. And then you have to report the t-statistic. And then you have to also indicate the mean and standard deviation per group. So there is an overall mean and standard deviation of life satisfaction. as a whole but you also have to see the mean and standard deviation of males versus females to see so may significant difference and then the next question is who is significantly higher is it the group of males or is it the group of females and the one that's going to give you an answer to that is to check the mean and sd of each group which is again also um uh it's also there is a command for this in the movie and then also you have have to look at Cohen's d no so you have to click on Cohen's d and then look at the value of Cohen's d in order for you to identify the effect size now in the department in our department or usually we don't report the Cohen's d anymore but I'm teaching you this because maybe you will be publishing in the future and some of the editors would need to see Cohen's d now there are three ways you can interpret Cohen's d as an effect of an independent t-test so If it's 0.2, it's small effect. Medium is 0.5, so that's medium effect. And then 0.8, that's large effect. Now, if your IV is dichotomous, like male versus female, and then your dependent variable is continuous, LSS, but you did not meet, no? did not meet the requirements of a parametric test, hindi normal yung distribution ng data, or you only collected via convenient sampling, then you have to do a non-parametric. You have to do a non-parametric. So, the non-parametric version of independent t-test is man with ni you. So, pag man with ni you, what are the values that you have to check? So just like any type of hypothesis, you have to look at the p-value. So if the p-value is less than 0.05, then you have to reject the null hypothesis. And then you have to report the U-statistic for report. You have to report the U-statistic. So ang ginigenerate ng man with ni U is a U-statistic. And then it also has a p-value for you to say whether or not you're going to reject or not reject the hypothesis, the null hypothesis. And then you also have to arrange for or... or report the median SD per group. So just in case there is a higher value for the, so just in case there's a significant difference, you can kind of find insights whether who is higher, is it the males or the females. Now, what if your independent variable has three or more categories? So let's give an example. What is the relationship between income bracket, income bracket being low, middle, high. So that's three categories. And life satisfaction, life satisfaction being a continuous variable. So how do we articulate that? And how do we articulate that as a hypothesis or a question? Is there a significant difference in the life satisfaction when grouped according to income bracket? So in this case, your independent variable is multinomial or you have three or more variables. So examples for that would be income bracket. Probably another would be first year, second year, third year, fourth year. By the way, there again is a... There again is an argument. For example, some would say that when you use an ordinal variable, like for example, first year, second year, third year, fourth year, that is an ordinal variable. You can use a non-parametric version of Pearson R for that. So you can use Spearman Rho. But sometimes there are also conservative researchers who would say that even if it is categorical, even if it's an ordinal variable like first year, second year, third year, fourth year, you have to treat it as a categorical variable. So I'm just saying that because you have to be really prepared because some statisticians will have this decision or some researchers will have this type of decision, some researchers will have that other type of decision. So in this case, but it's very clear, low, medium, high, it's better to really treat it as a categorical variable even if there is an obvious rank. But the DB is continuous, so life satisfaction scale is continuous. So what do we use? If it is a parametric test, ibig sabihin, you can assume that there is normality. then you can use what we call a one-way analysis of variance or ANOVA. So what are the values that you need to check upon generating your ANOVA? So first, of course, is the p-value. So if it's less than 0.05, then you can reject the hypothesis. You have to report the F-statistic. So again, the t-test is what it shows. the statistic man with new and Lilla lavasnya you statistic on parameter and you and a one-way analysis of our yes and Lilla lavasnya f statistic and then could be nice that you will also report for the mean and SD per group and then you also do a post hoc analysis no examples of Wednesday post hoc analysis for example there is a significant difference In the life satisfaction when group according to income bracket, low, medium, high, there's a next question is, okay, so which among these groups have an actual difference? Is it low versus high, low versus medium? or medium versus high or all of them. So in order for you to answer that second follow-up question, you have to do what we call a postdoc analysis. And you can either do 2K or Bonferroni. There are options that you can find in... the movie we will i will teach you that later and then we next we have um when your uh iv is multinomial so three plus variable so low medium high income for example and then continuous life satisfaction scores, but you did not meet the parametric requirements, it's that the distribution is not normal, then what you will use is a Kruskal-Wallis. test. If the normality does not meet. So what are the values that you need to check? So if p-value is less than 0.05, then as usual you have to reject the null hypothesis and honor that there is a significant relationship across the income brackets in terms of life satisfaction. And then the statistics that you will report is the x squared. And then you have to identify the mean and standard deviation per group and then you will also do a post hoc analysis just in case you need to um do a pairwise comparison no on the again low versus medium medium versus high low versus high and then and you know or if all of them has a pairwise a pairwise now remember you will only do a post hoc analysis when the p-value is less than 0.05. If it's not significant, then there's no point of doing a post hoc analysis. The next type of test for difference is, again, so itong naunang dalawa, these are between groups, between groups difference. Meaning, we're looking at the value of a given... outcome variable based on when grouped according to different groups, different categories. But here in this question, is there a significant difference in the life satisfaction pre and post lockdown? We're not going to differentiate the life satisfaction across groups within the larger sample, but it is within the same group. It's just that we might need, there is a repeated measure. What is a repeated measure? We're measuring the same life satisfaction, but the first measurement was measured pre-lockdown. The second measure was done. post-lockdown right so that's called repeated measures because same measure or same variable measured in two different moments in time so this can be observational like pre and post-lockdown this can also be in a context of an experimental research like for example you want to see whether or not um the knowledge of people in terms of let's say philippine history in terms of martial law uh will be improved after their exposure to let's say a gift of uh educational video. So you will measure their knowledge on martial law prior to the exposure to the video. You give the video after and then you measure again their knowledge after their exposure to the video. So that is the... example of an experiment. But for this example, it seems like it's observational, it's non-experimental. The lockdown happens in a natural setting, meaning the researchers have no control over the lockdown, but they do have the control of when to measure. So they measured prior to the lockdown and then after the lockdown. Now, when the dependent variable is continuous and then it's within groups or what we call again the repeated measures. Now, if you have two observations, meaning two groups, for example, pre versus post, and then it is seen in the same group, then we use a paired t-test. independent t-test independent because independent groups this one it's a paired t-test because it's matched groups so if it's parametric it's paired t-test and if it did not meet the the requisites of parametric tests then we use a non-parametric statistic and the non-parametric version of parity test is Wilcoxon sign test. Now, if you have three or more observations, like for example, pre, during, post, or for example, after a person gets stroke, you want to measure the quality of life of that person per year. So year zero, meaning the year that that person got stroke, and then you follow that person year one, year two, year three. So you have more than... two measurements, more than two repeated measures within the same group, then we use a repeated measures ANOVA. So yung kanina, that's a one-way ANOVA, and then ngayon, it's a repeated measures ANOVA. easy to understand because you're using a repeated measures and then for non-parametric version of repeated measures ANOVA just in case you do not meet the requisites of normality then you use a Friedman test so that's it okay so let's summarize it using this table so for the different tests of difference we use these tests when the iv is categorical and the dependent variable is continuous so if you're the difference you're testing is between groups when there are two groups and you have uh you have sufficed the requirements of parametric tests such as normality you use an independent t-test. If it did not meet parametric requirements, you use Mann-Whitney. If the number of groups in your IV is three categories, then if you meet the parametric requirements, you use One-Way ANOVA. If you don't meet the parametric requirements, you use Cruz-Calualis. Now, if it's again within groups or repeated measures, if there are two observations like pre and post, then you use a paired t-test. if it's parametric. If it's non-parametric, you use a Wilcoxon sign test. If it's three or more observations like year one, year two, year three, et cetera, et cetera, then you use a repeated measures ANOVA for parametric and then a Friedman test for non-parametric. Now, you will also, might be, when you read articles, you will be reading more complex statistics. I'll just give you, I will not, we will not be doing it in class, but I will just give you. you like ideas of you know what are the other more complex ways like for example if you have uh an independent categorical an independent variable that's categorical and a dependent variable that's continuous and then you have sort of a second factor that you want to see whether or not it will influence the relationship between the iv and the dv then you might be seeing like for example a two-way anova a two-way anova is the first way is the iv and then the second way way may be another independent variable or another covariate that you want to see in order to control for that covariate like for example of a two-way anova would be let's say um the independent variable would be let's say income so um so that's low medium high and then your dependent variable is let's say life satisfaction but you also want to see whether or not um there is an interaction between gen gender, male and female, and the income. Because intersectionality, the experience of a person whose low income may be different if that person is a female or that person is a male. So the second factor, you can put it in a two-way ANOVA. But again, we will not be doing it in this class. But that's an example. Now, what if your dependent variable is also categorical? So, of course, because it's not continuous, it won't be able to use all of these that I have listed. So, if the dependent variable is categorical and you want to see association, between a categorical independent variable and a categorical dependent variable. We will use a chi-square. The good thing about chi-square is that in itself, it's a non-parametric test. So whether or not you have... achieved normality it's okay because you use a chi-square the only problem with chi-square is that if you have more than three categories for either the independent or the dependent variable it's really hard to imagine the direction of that association whether or not if higher in size lower you know it's really useful only if you have dichotomous IVs and dichotomous DVs but again since chi-square is usually we see the two by two tables no we We see this usually in medicine, no? Because usually when we talk about outcome variables in health, usually it's you get sick, you don't get sick, no? You have COVID, no COVID. And then usually we treat exposures as exposure, no exposure. So that's a two-by-two table. That's usually how we use Chi-square. But in behavioral sciences, in the social sciences, you usually work with dependent variables that are continuous in nature, no? But I'm just presenting to you the possibility of Chi-square because maybe in the future you'll... So you will need to be careful. You need to work with outcomes that are dependent variables that are categorical in nature. And then there's also what we call an ANCOVA. So, analysis of covariance. So, for example, the two-way ANOVA I gave you is that the independent variable is categorical. And the second factor is also categorical. So the first is income, low, medium, high. And then the second factor is, let's say, what's that? A male, female, gender. What if your second factor is... is not a categorical variable but a continuous variable. For example, you want to see the interaction of low, middle, and high income and age because age is a continuous variable on life satisfaction, for instance. Now, if the first factor is categorical with multiple categories, obviously, and then your covariate is a continuous variable. such as age and you want to see the interaction of that on the dependent variable which is let's say life satisfaction then we use an ANCOVA but again we will not be discussing that in our class or I will not be asking you to do that in class I'm just giving you because in the future you might read published research and you might be encountering these things And then you might also encounter what we call a MANOVA. A MANOVA is multiple analysis of variants. So the dependent variable, you have more than one dependent variable. So that's a MANOVA. But again, we don't we will not use that in this class. So, pagka dalawa or mas marami yung dependent variables, we use MANOVA. But usually, in our classes, we only work with one. We only are interested in one outcome variable. Also, when you reach the levels of two-way ANOVA, ANCOVA, MANOVA, they're no longer bivariate statistics. They're already multivariate because, again, bivariate, you're working with variables two at a time. So if you have now, if in one... one test, you are working with three or more variables. It's already multivariate. We will get to multivariate later, especially in regression. But the other types of multivariate statistics like, you know, between subjects, repeated measures, ANOVA. ANCOVA, MANOVA, 2-way ANOVA, hindi na yun. Because that's a very intricate, complex na siya. So, we'll work, ang multivariate lang natin for our class will be regression. So, those are the different tests for difference. And then after this PowerPoint presentation, I'll be demonstrating to you how to do independent t-test, man-with-me-you test, one-way ANOVA, Cruzcal-Wallis, paired t-test. test, Wilcoxon sign test, repeated measures ANOVA, and Freidman test in Chamowi. Hello, everyone. So I think we're starting. Okay, so now I'm going to demonstrate to you how we can do the different tests for difference. So first, let's look at this in the context of the research that we are doing. So I'm going to use again the research study that we are working on as in the context of practice, right? So yung practice natin na data. So let's just specifically look at question number four because question number four is an inferential question and basically asks the relationship between social demographic characteristics and national resilience. So if you remember our discussion earlier that usually we do tests for difference. when your independent variable is a categorical variable and your dependent variable is a categorical variable. Because when you look at five, national resilience and life satisfaction are both continuous variables. So we will use another one for number five, and that is what we're going to be discussing next week. And then, even actually within social demographic characteristics, you will also see continuous variables. For example, in this sample case that we have, you have aging years. And aging years in this context is actually continuous variable. So we will not use T-test or ANOVA here and the other variants because this is not going to be a test for difference, rather a test for correlation, which again will be our discussion next week. But for the meantime, we will not be considering this. So remember, now. In our class, we are using a complex hypothesis. When you say complex hypothesis, yung isang tanong o yung isang hypothesis actually deals with a lot of bivariate tests. Why do I say do you have a lot of bivariate tests? For example, social demographic characteristics have multiple characteristics. So you have age, sex, region, highest educational attainment, and income. And this is going to be the same for the data that you're working on. More than one. So in this case, since we have five social demographic characteristics and then we have national resilience, which also is a lot. It's three. National resilience has three. domains, which is identification with my country, social solidarity and social justice, and trust in public institutions. Kayo din, you have a lot din. You have more than one domain for the main variable that you have for your study. So multiple, maraming bivariate tests kayong gagawin in class. But for the meantime, we are going to look at bivariate tests that test for difference across certain categories. And then we have... to look at categorical variables. So we have to identify first yung nature ng variables. So for example, sex assigned at birth, meron tayong dalawang in-assigned na category sa kanila and that's male and female. So that's dichotomous. So we have two choices. If it's dichotomous, independent groups, it's either independent samples detest kung ito ay parametric and then man with EU kung non-parametric. That's also the same for the region. Kasi dito, you can see that the gender is different region is NCR and non-NCR. So, it's just two groups, two categories. So, we are going to use that too, independent t-test or man with me. So, it's also very important in terms of the way you categorize or you measure the variables because, for example, if the region here is 12 regions of the Philippines, it's not going to be applicable anymore for independent t-test because it's more than two groups. So, it's really based on how Some of them are dichotomous because there's no choice. Like male and female, there's no other way, usually in research. But for region, you have many ways of imagining it. But in this case, I'd like to operationalize region as NCR and non-NCR. And then we have highest educational attainment. We have three categories, high school and below, college and post-grad. And then same for income classification, we have low, middle, high. And that's going to be your information in relation to what type of test for difference you're going to use in order to see the relationship between these specific variables with national resilience. And that is ANOVA, one-way ANOVA for parametric and then Cruz-Caluaris for non-parametric. And then we don't have it. in this specific example, but we will do later in a different dataset, the repeated measures. So let's start. So let's work first on the question, is there a significant difference in the national resilience for the three domains? when grouped according to sex assigned at birth. So you go to analysis, and then you click on t-tests, and you click on independent samples t-tests. Now, you have to put the dependent variable, which is the outcome variable, you can see it here, in that box. So let's say, let's start with identification with my country. And then you're going to put the grouping variable in this window. And then, the statistic will appear. So, as you can see, the first thing that you have to look at is the p-value. If the p-value is less than 0.05, then we reject the null hypothesis. So, here, we saw that the p-value is more than 0.05, then we don't reject the null hypothesis. So, what do we say? There is no significant difference, at least in the independent sample-street test, because later, it can change. Let's now check if that's going to be the same with solidarity and social justice and then trust in public institutions. So let's put that here. You can put it together. So you now have three. You have the T-statistics here. This is the T-statistics. So what you have to do here is just report it. I'll teach you about reporting by varietes next week. But here you have to look at the p-value. So this is the first thing we're looking at, the p-value. You can see it and it appears that it's more than 0.05. So we cannot reject the hypothesis. And therefore, we cannot, there's no significant difference. We can actually further get the, and then how do you get the mean and SD per group? So you click descriptives. So here you'll be able to see the mean and SD for 0 and 1. 0 yung females, 1 yung males. So may kita natin sa 0, yung mean is almost the same as the females. Same with the SDs. So you can really tell. You can actually even visualize it using descriptive plots. May kita ninyo yung pagkakaiba ng lalaki at saka sa babae. But it appears that there's no significant difference here. However... Dito ay ina-assume natin na parametric siya. Remember that we have to ensure first kung parametric ba ito o hindi. Now, you can actually check immediately here yung normal distribution. So let's click normality test. May kita niyo yung Shapiro-Wilk test for the three values. Shapiro-Wilk for identification with my country, for solidarity and social justice and trust in public institution. Lahat sila 0.001. yung Shapiro-Wilk. So, we cannot assume parametric. Hindi siya normal. So, we cannot use a parametric test. So, what we will use is a non-parametric test. So, ito, nakalagay yan, student t-test, yun ang automatic. But you can actually change this to a non-parametric version and that is man with neo. So, may kita nyo, nagbago na yung statistic niya, naging man with neo niya. And when you look at the p-value, you would... the identification of my country, nagkaroon na ng significant difference. Para natin nasabi, because the p-value is now less than 0.05. Nakita nyo, yung p-value na dito is 0.024. So, ibig sabihin, there is a significant difference. At least, using a non-parametric test, you are already able to see a significant difference between males and females. So, sino ang mas mataas? So, that's the time you look at the group description. here. So as you can see, the males, yung one, have a higher mean. So ibig sabihin, males have a higher score for identification with my country compared to female respondents. And then in research, you will have to explain bakit kaya may significant difference. And then comparable ang scores nila in relation to solidarity and trust in public institutions. Okay? So yun siya. So that's how you do an independent samples t-test. So now let's do that for, palitan na natin itong sex. And then let's do the second dichotomous variable here, which is region. So I'll put region here. Ayan. So as you can see, and again, check natin ulit kung if we can use a student t-test, we can't. Because the Shapiro width remains to be 001, 001. So it's less than 0.05. So it means there is a violation in normality. So we still keep using Man with an E.U. So we look now at the P value here. So the P value shows that for the identification with my country and trust public institution, it's less than 0.05. For solidarity, it's higher than 0.05. So no significant difference. So a significant difference there. identification with my country and trust in public institution, now we can look at the descriptive statistics to tell us who is higher, is it the NCR people or the non-NCR people. So the zero are the non-NCR people, the one are the NCR people. So as you can see, the non-NCR people have a higher score in terms of identification with my country. and trust in public institutions. So those who live outside NCR, they seem to have higher levels of national resilience in terms of identification with country and trust in public institutions. And then for solidarity, the values are comparable. And as you can see here, in Minya, the difference is very minute compared to identification with country and trust in public institutions. So that's how you do your... that's how you do your independent samples t-test and man with you. So as you can see, it's simple in terms of just clicking here. So what are the values that are important for you to see? It's the p-value because that will decide whether or not you have to reject or not reject the hypothesis. And then after you've seen that, you have to report on the mean. You have to look at the mean and see which is higher so that you have an idea kung sino ba yung mas mataas kung may significant difference. There's no need, naman sa no need, pero you can check if there's no significant difference, you can check the mean and SD to really confirm na ay wala talaga silang. or their score is too close. So, you can also click here the effect size, which is the D. And then, as you can see, you can see here the effect size. The effect size usually is very important for students to test because the cohen's is also used. So, as you can see here, the effect size is around low to moderate. But for Man With You, usually we don't really compare, we don't really have to report it anymore. Pero just in case you want to see, andito siya. So that is for your dichotomous variables. So let's move to your categorical variables with three or more. groups. So you have highest educational attainment, income class specification. So let's go there, click ANOVA and then one-way ANOVA. And then you identify the grouping variables. For example, let's look at educational attainment. Let's put educational attainment here in the grouping variable. And then we can already also include all of the continuous variables. Okay, and then look at the p-value and identify whether or not there is a significant difference across them. So as you can see, the p-values are higher than 0.05. So therefore, we do not reject the null hypothesis. There is no significant difference in the three domains of national resilience when grouped according to educational attainment. Again, we can visualize that by adding your description. tables. So nakalagay na dyan yung mga means. You can actually also put descriptive plots para makita nyo kung pagkakaiba. So as you can see, may mga overlaps. So hindi talaga sila magkakaiba. So let's move na lang sa isa na mas interesting, which is income. So we have low, medium, high income. So as you can see, for the three, the p-value is less than 0.5. So if it's less than 0.05, then we reject the null hypothesis and infer that there is a significant difference. So now that we know that there is a significant difference, let's now do a postdoc analysis. So the question is, okay, so there are differences. But remember, those are three groups. Low, middle, and... high income. So, alin dun sa low, middle, and high income ang actual na magkakaiba-iba. So, we now will click post hoc test. So, we choose two key tests. And now you will see here a cross tabulation of 1 versus 2, 1 versus 3, 2 versus 3. So, you will see here that the mean difference of low minus the value of those in the middle class is positive and the p-value is 0.01. So it means that there's a significant difference between 1 and 2, with 1 having higher levels of identification with country. Same thing with high income. So low income versus 3, which is high income. Look at the p-value. It's also less than 0.05. So it means there's a significant difference between 1 and 3. with lower-income people having higher levels of identification. So it appears that the poor have a higher identification with the country compared to the rich. And then when you look at the middle versus high income, the P-value is 0.513. So it means that there is a significant difference between middle and high income. So it's really just the low-income having significantly higher scores of identification with my country compared to their middle and high-income counterparts. So let's look at the other. So same thing, mas mataas ang solidarity. scores for low-income versus middle and low-income versus high as indicated by the p-value that's less than 0.05 here and same thing yung two and three hindi din sila nagkakaiba because the p-value is higher than 0.05 so it means that significantly higher ang low-income compared sa kanilang middle at high-income counterparts in terms of solidarity let's look at trust in public institutions so it appears that low-income individuals Same thing, the P-value for low versus middle and low versus high, the same P-value is higher, is less than 0.05. So there's a significant difference and specifically, significantly higher is the low income. And then at the same time, middle and high income, it's comparable because the P-value is more than 0.05. And therefore, we do not accept that. they are significantly different. So again, same thing. It's the low-income who report higher solidarity, higher trust in public institutions, and higher identification with the country. So again, in research, we'll have to explain, make possible discussion, interpret, why do you think this would be the case. But again, we're doing this in the context of we're doing this in the context of we are assuming that this is parametric. But again, you really have to test first whether or not it is parametric or not. So how do we test if we can do parametric? Then meron din ditong option for normality test. So let's click normality test. And as you can see, same thing, Shapiro-Wilk is significant for the three. So we apparently should not have done an ANOVA. So I just did it para lang makita nyo how. how a one-way ANOVA would look like but based on our Shapiro-Witt results, we cannot do a one-way ANOVA. So we have to do a non-parametric version of ANOVA which is Cruz Caloales. So where do we find it? So click ANOVA, click one-way ANOVA Cruz Caloales. So the bad thing, the not so happy thing about non-parametric, it's not as insightful in terms of findings compared to parametric. As you can see, the other analytical tools you have here aren't as robust as parametric. But since that's what the data is, that's the behavior of the data, that's the distribution of the data, there's really not much we can do about it. So let's move now to let's do the same. So we put the three dependent variables here that are continuous. And then we put the other. And then we put income class here. So that same thing, the p-value is less than 0.05. But as you can see, the statistic that is reported here is x squared na. Hindi na siya f. So if you can see here, f yung nire-report niya. Dito yung f. So nire-report niya yan. But what you really use to interpret is the p-value. And then you can do pairwise comparisons to see whether or not there's a significant difference between or across the variables. For example, low versus middle, significant. Low versus high, significant. Middle versus high, not significant. So same thing here, low versus middle, high. Low versus high, significant. Middle versus high, not significant. So same thing with trust in public institution. Low, middle, low, high is significant. Middle, high is not significant. The W is the statistic. But where do you find which is higher or lower? You have to go back to the mean distribution. So again, one is higher than two and three, obviously, here in our findings. So it's interesting. It is the poorer populations, the poorer Gen X and Gen Y, who have... better identification, better solidarity, and better trust in public institutions compared to their middle and higher counterparts. So there are many speculations in relation to that but that won't be a part of my discussion. Maybe we can talk about it in class later. But yeah, so that's how you do a non-parametric Acruz Calhualis test. Now, what if we want to use a repeated measure, so it means within analysis. So, for example, we want to see whether or not quality of life within this group of 700 participants has improved. or it has changed pre and post pandemic. So you have here same measure or quality of life. Ang pinag-iba lang ay when it was measured. So ang tawag dito ay repeated measure. So quality of life was measured multiple times in different periods. So if, for example, the question is, is there a significant difference in the pre-quality of life and post-quality of life scores? among in the sample then we will use a paired t-test because it's just two observations so let's do paired t-test here no so again click t-test and then paired t-test and then you will just have to include here the paired variable so i would put here so let's look for that so for example we want pre and post so pre and post so i put here so you see here there is a T-statistic, there's a p-value and it's less than 0.01. So it appears that there is a significant difference in the quality of life pre-lockdown and post-lockdown. So if you want to be able to see which is higher or which is lower, you can click the descriptive. So we can see that the mean for the pre-lockdown is higher than the post-lockdown, which makes sense. And then you can actually include that. effect size so the CoHINsD is moderate no 0.46 no they have a moderate effect size but again let's check first the normality test so now meet by normality test so when you look at Shapiro-Wilk no it's it's higher it's lesser than 0.05 so Shapiro-Wilk is significant therefore we cannot assume normality so we have to do a non-parametric test so What would be the non-parametric test? Wilcoxon rank sign test. So, dito yung Wilcoxon sign test. Click ito. So, iba na ngayon siya. You would see. But you would see here that it's the same. You see that the p-value is still 0.001. Okay. So, ibig sabihin there's a significant decrease. There's a significant decrease in quality of life post-lockdown compared to pre-lockdown. So that's paired test. As you can see, we are measuring the same sample, but the measurements are the same but in different times. So the independent variable here, if you will, is the time of measurement. But the measures are the same. Earlier, if you notice, independent samples, single period of collection, but we are differentiating across groups within the sample. So that's the difference. Kaya siya paired sample kasi paired or repeated yung measures. But we use Wilcoxon here or the non-parametric kasi nga we did not meet the parametric assumption. So let's, what if more than two? For example, we want to see the difference from pre-lockdown, year one of lockdown, year two of lockdown, and post-lockdown. So let's see. So let's move to... ANOVA and then let's click repeated measures ANOVA assuming that this is parametric. So there are three things that you have to do. So first is you have to identify the factor. So quality of life is the factor and then you have to identify whether there are levels. So the first the levels are three. you have the pre-lockdown, I have the year one, you have the year two, and then you have to add the fourth, which is the post. Okay? And then you have to identify dito sa repeated measure cells kung ano-ano yun. So I'm putting pre here, year one, year two. and post. So may lalabas na ng mga variable na findings dyan. Let's just wait. Medyo mabagal lang. Okay. Ayan na, lumabas na siya. So as you can see, the p-value is less than 001. So ibig sabihin there is a change across time. So how do we ascertain? which had the changes. So first, we can visualize it. So, ah, wala pala itong visualization. We can do post hoc tests. So let's do post hoc tests. Okay, it will slow down. Let's wait for it. So, for the post hoc test, you would see here that the pre versus year 1, the PV is less than 0.05, so there will be a change in year 1. And the mean difference is 1. So, it means that the pre of year 1 is higher. Higher yung pre than year 2 At higher din ang pre than post So sa year 1 naman Than year 2 Mas mababa si year 2 Than year 1 than year 2 As seen in the negative score here So ibig sabihin mas maliit ito Mas mataas ang year 2 So in 2020 Mas mababa yung quality of life At tumaas siya nung year 2 At mas mataas pa rin yung 2020 kesa sa 2022. Mas mataas, yeah. Mas mababa. Mas mababa si year 1 kasi nga the negative. And then year 2 to post, negative siya. So ibig sabihin, mas mataas si post than year 2. And then as you can see, the p-values, it's all less than 0.05. So it means na across the board, across the board, may changes. So pre, it was high. bumaba siya year 2 low, and then may significant increase in year 2, and then may significant increase in post-lockdown. So pre-high, dip year 2, dip ka ng 2020, increase ka ng 2021, and then increase ka ulit ng 2022. So that was the movement. And all of the changes are significant according to the p-values. But, again, let's do the assumption checks. There's no normality here, but we will take these normality tests. Okay, let's do the exploration and let's do the normality tests. Because there's none. So Shapiro-Wilk, let's put in the four. So as you can see, Shapiro-Wilk, all high. All significant, I mean. So all less than 0.05. We cannot use the repeated measures ANOVA. So we will use now a repeated measures ANOVA that is parametric and that's Friedman. So let's do this. So let's just put the four. So where are those? There. So, you can see it's less than 0.001. So, you can see it's not very insightful. But there is a significant difference across their longitudinal associations, across time and quality of life from pre, first year, second year of lockdown, and post-lockdown. We can also reproduce descriptives. So, as you can see, our assumption is correct. As you can see, the mean pre is high during pre-and go-low. got low in first year, got higher a bit in year two, and then post got higher. Pero mas panas pa rin yung price. Ibig sabihin, hindi nabawi yung quality of life post lockdown. You can actually do a descriptive plot. Ayan. Actually, parang mas maganda pa itong non-parametric kasi meron siyang plot. So, makikita mo talaga dito in the descriptive plot, kung ano yung movement niya. Mula dito sa mataas, nag-deep nung year one, and then incrementally umakyat. But as you can see, from pre, the pre-pandemic or pre-lockdown quality of life did not reach. It's still lower. So that is how you use a repeated measures ANOVA. But since you weren't able to use... So this is just a demonstration for the paired t-test, Wilcoxon, repeated measures ANOVA, and also Friedman. So this is Friedman, what we did. You will not be able to do it on your own because you haven't collected repeated measures of data. So you will not be doing this but I demonstrated this to you so you have an idea just in case you need it for future. For example, especially for international studies majors here, for example, you want to be able to see the changes in GDP rates in a specific country. for year 1 to year 20. So this repeated measures ANOVA would be a good way to to ascertain if there is a significant difference in GDP across countries from year one to whatever year you want to end. So that's another example of a use of repeated measures ANOVA in another social context. Okay, so there. So that is our AjaMovie demonstration for the different tests for difference.