Understanding Exploratory Factor Analysis in Stata

hi everyone in this video I will be talking about exploratory factor analysis using stata now exploratory factor analysis is basically a technique that is designed to allow you to explore the inner relationships among a set of measured or observed variables and the idea is to try to discern whether there are some underlying factors that account for those relationships so as you can see on your screen right here I've just got a couple of models exploratory factor analysis models it's basically a two-factor model um and these are the models that we will ultimately be coming up with as we go through uh this particular video demonstration but if you look at it this is kind of using uh diagramming uh that if you're familiar with structural equation modeling uh you'll notice that the the boxes represent the measured or observed variables in the data set the ovals on the right where it says spatial and verbal those represent latent variables are underlying variables with the idea that those variables may be accounting for the the relationships Among The observed variables and you'll see that we have these arrows pointing from those latent variables to the measured variables just kind of representing again that idea that the variation in those observed variables is reflecting those latent variables you'll also notice too that on the left you've got these little circles those are basically referring to measurement error associated with each variable and and the term that we would use is called uniqueness so as you can see in both of both of these models there is measurement error that is associated with each of the measured or observed variables in our data set so what we'll be doing is we're going to be taking uh this set of six measured variables and Performing the exploratory factor analysis and so this right here this actually reflects where we're going to end up at the end of the um at the end of the video but this is just kind of a capturing the general idea behind exploratory factor analysis so what we do is we take a a set of observed or manifest variables we go through a series of steps to try to determine how many factors might account for the interrelationships among the set of variables then we once we've made that determination we we essentially perform our now our main analysis and retain that many factors and then from that we try to interpret the factors based on the relationships between those manifest or observed variables and the latent factors so there's a series of steps that are involved in factor analysis and so when it comes to carrying out the analysis with any program but including with stata you need to be thinking through these steps so the way I've got this uh this presentation sort of uh arranged is to essentially walk you through uh decision steps and show you the associated procedures using stata now if you're familiar with a program like SPSS that has sort of the drop down menus and you can kind of just do a few points and clicks and then you can generate your output kind of all in one step that's that's great but in stata a lot of the procedures a lot of the commands if you will are distributed across different different modules if you will so there's like the main uh the main exploratory factor analysis where you can carry out the main analyzes uh there's also post estimation commands and then there's also other commands that are not necessarily part of the the base program and so you have to import them as a package and to tell you the truth some of the things in SPSS it's also going to be limited in that um you know it's not going to contain everything so as we go through the video what I'm going to be doing is I'm going to open up a do file which is basically kind of a syntax editor if you will and then type in the syntax and I'm going to explain what's going on as we go through and my recommendation for you as you're working with stata particularly in these cases where there's there are analyzes involving a lot of different uh procedures and decisions and so forth is to kind of work on a work on developing a strategy for writing up a do file so that you can uh you know you can save it and then you can use aspects of it in the future so at any rate that's how I'm gonna be proceeding in this particular video demonstration so I'm gonna uh go ahead and open up stata so here we are and uh just kind of taking a quick look at the data is actually already imported into stata and um you can actually download a copy of the data by following the link that's provided underneath the video description but basically this is data coming from about 301 uh seventh and eighth grade students these are basically measures of intellectual ability this is from the classic uh helsinger and swineford 1939 data set and it's used quite often in illustrating factor analysis and related procedures so the variables that we have in our data set are these right here we have visual perception cubes lozenges then we have this paragraph completion sentence completion and word meaning right here and just kind of you know if we just take a brief look just at the variables themselves it kind of looks like maybe some of these variables might reflect kind of a verbal ability type of factor and some of them might reflect more of a spatial ability Factor so um but typically you know oftentimes we you know when we go into factor analysis we may not necessarily have a very clean idea about where things might lead load we might have a general idea about which variables might go together if you will and that's where exploratory factor analysis really becomes handy is we can kind of explore the structure of the underlying data try to identify those factors that account for the interrelationships among our variables so in our data set our raw data set we have 301 cases and really the first step when it comes to carrying out exploratory factor analysis involves you know studying the the correlations among the variables themselves so essentially kind of compiling the information into a correlation Matrix and then studying aspects of that correlation Matrix to to determine if it even makes sense to perform an exploratory factor analysis so as we're thinking through this process we want to one thing that we uh you know we what the thing that we don't want to have happen is we don't want to have a correlation Matrix where essentially the variables are uncorrelated with each other or minimally correlated with each other so essentially we don't want to have a correlation Matrix where um the the correlations are all around zero that's not an ideal situation for carrying out um an exploratory factor analysis by the same token we also don't want to have collinearity among our variables as well we don't want to have linear uh you know linear dependencies in our correlation Matrix that could end up uh producing sort of uh problems in terms of our model estimation so that's kind of the first step in the process and then once we make that decision then we kind of you know if we decide that that it's reasonable to carry out an exploratory factor analysis then we proceed to The Next Step which is to explore possible um uh Factor structures so in other words to kind of determine how many factors might account for the interrelations among our measured variables so I'm going to go ahead and open up a do file I'm going to click up here at the at the very top I'll just kind of show you I'll Circle it on here so there's our little uh our little icon uh to uh open up our do file so I'll click on that and so you can see here's my do file and what I'm going to do is also I'm going to kind of um increase the font size well I guess I can't do it until I actually put something in so what I'll do is uh I'm going to start out with that first step of just looking at the correlation Matrix to determine you know is it even reasonable to carry out the factor analysis so I'm going to start by typing in PW core and then the names of the variables in my data set so the names of the variables that I'm going to be submit subjecting to factor analysis are the visual perception so I'll type in vis a p r p e r c do we have cubes we have lozenges we have Tara comp right here we have sense comp right here and then word mean right here and I'm going to be using these in the future so I'm actually going to go ahead and copy these uh so that I can just paste them in later on as I need them then I'll type in a comma and then I'm going to use the list option right here for list wise deletion because when I carry out my factor analysis um it's essentially by default going to use all cases that have all uh obser that have that have observations on all the variables they're being subjected to uh factor analysis and so if you have if I leave this off basically what will happen is it'll use pairwise deletion in order to construct the correlation Matrix and we're going to actually just kind of make the assumption that we're gonna we're gonna be working with the full uh complete data data set so now what I'll do is uh I think I can zoom in now yes so I will I just kind of go in here and zoom it a bit for you so you can see a little bit more clearly all right there we go so at this point I can highlight this and I'll click the execute selection button and so now you can see that I get a correlation Matrix so in this correlation Matrix what we're looking for is evidence that it's reasonable to carry out the factor analysis and so we are essentially one of the main things is to look to see if you have correlations that are non-trivial um so you know a common rule of thumb is basically to look for uh in the correlation Matrix to determine if you have correlations at least several correlations that are above 0.30 and so as you're looking at this correlation Matrix right here you can see that we have some pretty uh pretty good correlations we've got a 0.44 here 0.3398 there we've got over here we've got .7332 0.705 right there um you know so you've got we've got multiple correlations in this Matrix there's a 0.72 over here we've got multiple correlations in this Matrix that at least indicates to us that it would be reasonable to carry out our factor analysis so that's really kind of the first component of determining if it's reasonable the next thing that we want to look at in addition is we want to look to see do we have linear dependencies uh there that would produce essentially a singularity and what Singularity refers to as a situation where you have a linear dependency that will prevent certain Matrix operations to be carried out and it you know you'll end up with um uh you know a solution that's not going to work you also have a the issue of multi-collinearity where you have such high relationships in your correlation Matrix that could also generate inadmissible estimates so we don't want those issues to be at uh to occur so to check this out what we will do is we will compute the determinant of our correlation Matrix so the determinant you can think of as as a generalized variance for the correlation Matrix and we don't want the determinant to be zero so we want it to be greater than zero and there's a common uh sort of threshold that's given uh also you know and if you read through uh Watkins book on exploratory factor analysis was stated in some other places you will see that the I guess you could say the threshold if you will that we're looking for is a determinate value that's going to be greater than .0001 so that's kind of what we're looking for there so we need to actually compute the determinant of our Matrix so I'll show you um the first way is just to since we've run our correlation Matrix then we can kind of pivot off of the estimates in that Matrix to generate the um the determinant so I'll type in Matrix R equals and then r c so what's going to happen is is that you know once we've generated this correlation Matrix uh that information is stored as an as a set of estimates but we want to be able to access those contents and generate our our determinants so I've got I'm typing in Matrix R equals RC then I'm going to type in Matrix D equals and then we'll type in DET and then R so essentially the second command is essentially Computing the determinant of R and saving it into its own Matrix and then we can actually see the determinant if we type in Matrix list d right here so these are basically commands from Watkins book that was shared so this this is a really nice uh um addition right here so we'll what we'll do is we'll highlight these and we'll press the execute selection button and you can see the determinant value right here it's 0.10845 so as you can see that value is clearly greater than the threshold of .0001 so in that case we we we're not going to have any concerns about there being collinearity or Singularity within our Matrix so the next step another set of Diagnostics that we can generate include not only the determinant but also the Kaiser Meyer olken measure of to for assessing factorability of the Matrix as well as the bartletts test and so in order to generate those results we can use a package that has been written uh so what we'll do is we're going to type in SSC install Factor test now what the package that I'm talking about this is a user written package so this the functions associated with this are not part of the base stata program but rather a user has written this package and then we can use the commands with this package it's kind of the same Principle as if you're if you're familiar with r basically to use functions associated with a package you have to actually have the package installed so what we're doing is we're basically calling up this package from a a website and then having it be installed and then we can use the function associated with it so I'm going to highlight this and click on execute selection and the package has already been installed of course because I've already gone through this but this just kind of shows you the steps right there so then the next step is to um is to use the the function so I'm going to type in Factor test and then my variable names so I'll type these in right here and then highlight all of this and we'll click on execute selection and you'll see that we get several pieces of information we get our determinant again so we technically didn't have to use those Matrix operations to get it but this that was kind of showing you the steps right there this is the determinant of the Matrix so the 0.108 we have Bartlett's tests of sphericity which is next so basically it's a chi-square test this is it right here and so what we're testing uh is whether our observed correlations our correlation Matrix do those does it differ significantly from an identity Matrix so an identity Matrix is one where you have ones along the principal diagonal and zeros on the off diagonal and with an identity Matrix if if our correlation Matrix was an identity Matrix it would basically indicate that that there that all our variables are incorrelated with each other so and that's not an ideal situation as I as we kind of talked about earlier when we were looking just to see if our correlation Matrix had any correlations that were 0.3 you know at least 0.3 this is another way of kind of making that determination as to whether we have trivial correlations are there you know there's enough enough there that it makes sense to carry out our factor analysis so a statistically significant bartletts test would be regarded as an indication of that that it's appropriate to carry out our analysis now keep in mind too though that this test is going to be affected by sample size and so you know if you happen to have a very large sample size you can have a situation where you have very trivial correlations but the test be statistically significant so it's really kind of good to look at both this test result and the look at the actual correlations themselves to make that determination but nevertheless you can see that we have a significant bartletts test result which would signal that we have a correlation Matrix that where it makes sense to carry out the factor analysis the last index that we have down here is the Kaiser Meyer olken measure of sampling adequacy you'll see this referred to as the kmo measure and so this is just another way that we can make a determination that it that makes sense to to carry out our factor analysis so without getting into all the details there are various rules of thumb for evaluating being the factorability of a matrix using this index and and uh you know the most commonly uh common system I guess you could say in terms of sort of labeling the um you know the factorability of our Matrix uh is this system right here where we have values that fall between 0 and 0.1 0.49 would suggest it's unacceptable to carry out the factor analysis uh 0.50 to 0.59 is you know miserable uh You've Got .60 To 0.69 Mediocre 0.7 to 0.79 midling uh 0.8 to 0.89 meritorious and 0.9 to 1 marvelous so this these are just kind of labels that have been uh uh proposed by case Kaiser and in his earlier work during the basically the 1960s and 70s so at any rate if we look at our kmo value in our output you see that we have a kmo of 0.76 which according to this system would be considered middling but this oftentimes is regarded as uh you know generally acceptable uh for carrying out the factor analysis so at this point we can say that you know if we take all of our information up to this point we we can say that okay it looks like it's reasonable to carry out our exploratory factor analysis so given that now what we want to do is to determine how many factors might be uh might underlie or explain the correlations among our set of measured variables and so this is uh the next part of our exploratory phase which is uh you know making that determination with respect to the number of factors so once we've made that determination though going through these steps then the next step is to essentially kind of force that factor solution and interpret our results which incorporate also the interpretation of rotated Factor loadings so in this process of exploring for the factor structure there are very there are a number of rules of thumb out there one or number of approaches out there I should say one common approach is to use the eigenvalue cutoff rules so essentially what that means is that you subject your your data to a principal components analysis and extract a set of eigenvalues which are essentially summarizing the variance among a set involving a set of measured variables and basically retaining the number of factors that happen to have eigenvalues that are greater than one that's the old eigenvalue cutoff rule basically the minimum eigenvalue for Factor retention is one and that's also referred to as the Kaiser Criterion so that's one approach it's not actually one of the better approaches to making the determination of the number of factors it's actually regarded as nowadays is it's it's it's not one of these approaches that are that's encouraged by Factor analysts another approach is to examine a screen plot so a screen plot basically provides visualization of the the eigenvalues associated with um the components from that principle components analysis and the ideas that what we're doing is we're essentially screening our our data to make that determination so we're using principal components analysis in order to generate those eigenvalues and then we can from there we can also um uh visualize uh possible factors that are in our data so in order to carry out our analysis what we'll do first off in order to uh to get our eigenvalues what we'll do is we'll type in PCA so we're going to use the PCA command and then type in our variables okay so this is uh the first step right here so I'm going to highlight this and run it and just keep in mind that the eigenvalue cutoff rule was actually developed uh in the context with using PCA in order to identify the number of factors so it's not really advisable if you're going to use that rule to utilize a common factor analytic technique in order to identify the number of factors because it's it really developed in the context with PCA so if we were going to use the eigenvalue cutoff rule or at least kind of consider it in relation to other information then in that particular case we can look at our output and you'll see that we've got let me highlight it here we've got over here we've got a column containing eigenvalues and we've got uh two uh two of these components right here that have eigenvalues that are greater than one so if we were going to use the eigenvalue cutoff rule then in that particular case that would suggest uh you know a two-factor uh model so that that would be one piece of information right there that might suggest retaining two factors the other option to use the screen plot and we're also going to cover uh some other things as well here in terms of determining number of factors but we can type in scree uh test right here and at our screen plot I'm sorry screen plot screen plot and highlight and we'll run it so now you can see that we get this plot of eigenvalues against those component numbers so as we're looking at this just kind of notice that um you know it looks like the side of a mountain all right and so as so in terms of interpreting this screen plot what we're looking at what we in order to determine the number of factors is we look at the main eigenvalues on the side of the mountain so to speak or as we're getting closer to the peak and then we got those eigenvalues that look like they're they're at the base of a mountain there's kind of the rubble if you will and so as you're looking at this you can see that we've got two two eigenvalues that are clearly on on sort of the steeper parts of the mountain we've got this eigenvalue right here for for a component one eigenvalue here for component two and you can then see that as we move from uh component one to two to two to three you can see the slopes are are kind of leveling off then when we move to three to four uh it's not as steep as what we have between uh two and three between four and five maybe a little bit a little bit more slope but still not much change and then from five to six not much change so in a nutshell what we're looking at is kind of if you want to think about it too as a as kind of an arm if you kind of hold you know Bend an arm or whatever uh this this might be considered an the elbow and you would retain the number of factors above the elbow so in this case right here the screen plot visual indication might suggest retaining a two-factor solution as well or retaining two factors as well yeah I will say this too that if you wanted to use a common factor analytic approach like uh iterated principal components or um or maximum likelihood and generate discrete plot from that that would also be perfectly permissible so you could use either of those approaches but uh most commonly it's going to be uh that you're using the screen plot from that components analysis so the next approach that we're going to take in terms of determining the number of factors is parallel analysis so parallel analysis basically involves taking the eigenvalues that we've generated from our principal components solution and comparing those eigenvalues against randomly generated eigenvalues and the idea is is that we would retain the number of factors or we would assume that the number of factors is related to the the number of eigenvalues from our PCA that exceed the randomly generated eigenvalues so in order to do that what we're going to use is a another user written package called f-a-p-a-r-a all right now in order to get this we're going to use we're going to use the search function and we're going to type in p-a-r-a excuse me f-a-p-a-r-a okay so we need to I you know locate the package so I'm going to use the search Command right there the name of the package right there and then we'll run it so now you can see that we get this um this box with of you know information related to this package and so forth I'm going to go down here it says one package found stay down here and I'm going to click on this it says f-a-p-a-r-a from this website so I'll click on this and you'll see that there's a button that said or it says clicked here to install and so I'll click on that to install that package and if you want a little bit more detail in uh in relation to that package you can click on this little fapara.hlp that's a help file right there so I've installed the package and so now we are ready to go in terms of carrying out our parallel analysis and once you've installed the package there's not going to be any need to um to reinstall it so just kind of keep that in mind unless maybe there's an update later on that you uh want to have so at this point we are going to use the command associated with this package which will be f-a-p-a-r-a comma then we're going to type PCA then reps and refs is referring to the number of simulated data sets from which we're going to generate our random eigenvalues so we're going to I'm going to type in 1000 right here and then we can set a seed number so if I type in seed and then a seed number I'll just type in one two three four five if I wanted to go back and reproduce the same results with the uh the randomly generated eigenvalues that seed number is going to guarantee that I have the same randomly generated eigenvalues in the future associated with this and so that's that can be useful if you don't want to worry about that if you if you don't mind if I if you run the analysis and then generate the random eigenvalues and then you rerun it and you get different values if you if that's not something that concerns you then there's no need to set a seed number but that's the basic idea behind that so at this point uh we're we're we are uh pivoting off of that earlier PCA so just kind of keep that in mind so we ran that earlier PCA up here we've uh we've got those eigenvalues stored and so we want to Pivot off of those eigenvalues so that's why we've got this laid out in the following way so I'm going to highlight all of this and click on execute selection and so you'll you'll see that I get um another screen plot but in overlay with the randomly generated eigenvalues so you'll see that that dotted line represents those randomly generated eigenvalues we're essentially connecting them and so as you can see it crosses over just above the elbow that we had referred to uh previously and so that would suggest then that we have that two factors uh may be accounting for the relationships among our variables if you want the actual values themselves you can look in this table over here and you've got the the eigenvalues from the initial PCA then you've got the randomly generated eigenvalues right here so you'll see that the first randomly generated eigenvalue is 1.19 right here the second one is 1.0996 and you've got 1.02 and so what we're looking at in terms of our data over here the eigenvalues from our data we we essentially retain our our retain the number of factors where the eigenvalues from our data over here exceed the eigenvalues for from the PC the parallel analysis and so once you have an eigenvalue from associated with the data that drops below the randomly generated eigenvalue then you retain everything above that so you'll see that both for our Factor one factor two the the eigenvalues from the data right here exceed is greater than the eigenvalues from the rent are randomly generated but then on that third Factor down here you can see it's actually less than and so in that particular case you would that would support a two-factor solution as well so now we've looked at three different approaches for making a determination with respect to the number of factors we've looked at we've used the eigenvalue cutoff rule we've used the screen plot and we now we've used parallel analysis one other option too is the minimum average partial correlation and so in order to use that particular index that does require yet another package so in order to obtain that package we'll type in SSC install and then Min nap right there okay so we're going to install this other package from this SSC site and let me just kind of highlight uh go ahead and click on the execute the do file and so now you can see it's installed it's actually had already been previously installed but now it's installed yet again and we're going to use the Min app command associated with this package so we're going to type that in and the names of our variables so now we've got those names that are included we'll highlight this and click on execute selection and so now we get the we get these minimum average partial correlations to uh from our components and essentially what we want to do is to select the number of components that are associated with the smallest map value so as we're looking at this you can see that we have essentially a comparison between different models we've got a model with uh you know basically one one component two components three components and so forth and in terms of making the determination of the number of factors what we'll do is we are going to look at these values and this column over here and you can see the smallest value right here the 0.10246 is associated with a component solution that has two components so in that case we would use that to make the determination that in in terms of of the number of factors that may account for those relationships among our variables that we we would stick with a two-factor solution so at this point we we now have another piece of evidence that suggests that maybe a that a two-factor solution is warranted okay there is yet one other approach that we could use um and that also would entail uh comparing different uh uh Factor models and so whereas everything that we've done up to this point we've actually been using a principal components uh approach using the eigenvalues from the principal components analysis we've used the screen platform or principal components analysis we use the uh you know the minimum average partial correlations um associated with the principal components analysis and another approach is to use you know a technique that involves making model comparisons using a common factor analysis approach so just kind of keep in mind there's a long-standing arguments and debates in uh fat you know the factor analysis literature about using principal components to and treating uh you know solutions from that as um as factors but all you know everything that we've discussed up to this point really developed around principle components analysis so if you want to also consider perhaps using a common factor analytic approach that you know with principal components analysis you're summarizing all of the variation in a set of measured variables whereas common factor analysis uh basically is focused on breaking down the correlations among a set of measured variables so that's kind of the way to think about it so if you also want to look at perhaps an exploratory technique that would involve a common factor analytic technique then one option would be to compare models with different factors or different possible factors against each other using a common factor analytic technique so what we're going to use in this case is maximum likelihood estimation we're going to carry out maximum likelihood factor analysis and then we're going to generate information indices associate with different Factor models and then make a determination of the number of factors from that as well so what I'm going to type in is I'm going to type in Factor then the names of the variables right here comma and then I'll type in ml so this is going to generate a maximum likelihood factor analysis solution now at this point we're still in the process of determining how many factors we want to retain in our final solution so at this point we don't really want all of the additional factor analysis informations all we want to do is to really generate the um the indices for the model comparison so I can I can basically suppress all of the factor analysis uh output uh for our for this process by typing in quietly okay and so that's going to suppress uh the fact the maximum likelihood output so if I highlight this and I execute the selection you can see that nothing shows up and that's fine but and that's the re and basically the reason why is because I typed in quietly I'm basically suppressing that output so I don't get quite so much stuff going on you're welcome to generate that if you want to but right now that's really not of a primary concern so then we'll use the estat post estimation command and type factors so when I highlight this and uh execute now you can see that we get output in the form of the Kiki's information criteria the Bayesian information criteria you can see so that's uh basically these indices right here you'll also see that we've got the log likelihoods for them for three models as well so you can see that in our output we've got three different Factor Solutions we've got essentially a one factor model a two-factor model and a three Factor model that are being compared and so what we're looking for in terms of AIC and Bic we're looking we're looking for the lowest values so the lowest values of AIC and Bic are associated with the factor model that we would select okay now they these don't always agree with each other and so you know it's possible then certain circumstances maybe you might have uh you know a couple of solutions that might be viable but in the current case right here you can see that the lowest values of AIC both AIC and Bic are associated with a two-factor model so in this case now we we yet have some additional uh support for using um a two-factor model uh when we when we run our main analysis so as you can see we've gone through all of these different uh possible ways of assessing um you know or making that determination of the number of factors and it really is kind of important to keep in mind they won't all necessarily agree okay so even though they all agreed uh as we went through this particular demonstration right here they don't have to agree they don't necessarily have to agree and so sometimes you know you can have a situation where uh perhaps some end of some approaches suggest maybe like a two-factor model others might suggest a three-factor model um and so in that particular case then when you run your primary analysis you might consider running it uh you know running a tube and and a three Factor um model and then sort of um looked at look at the interpretation interpretability of those models and then make sort of your final determination off of that but in this particular case like I said we have um we have a set of procedures that we've gone through and really all of them supported the same conclusion of a two-factor model so at this point now we are ready to go in terms of running our final model and interpreting the factors so in order to carry out our analysis though what what we're going to do is we're going to take the factors that we determined uh from this previous step and we are going to force a solution with that number of factors so what we'll do at this point is we're going to use factor analysis again but a common factor analytic approach and there are there are various approaches that are available in stata so you have uh iterated principal factors which is also referred to as principal access factoring you have a non-iterated version you also have a maximum likelihood so for this demonstration I'm just going to be using the iterated principal factors and to carry out this particular analysis I'm going to type in um not that I'm going to type in a factor then the names of my variables right here and then I'll type in a comma and then ipf for iterated principal factors then I'm going to type factors and inside parenthesis I'll type A 2 right here and you know I I I oftentimes utilize SPSS and and uh various analyzes and also um with um you know in in teaching and so forth and so one of the things if you are if you are an SBS user and you want to get essentially the same uh results the model is going to fit exactly the same but if you want you know uh some of the same results in terms of the proportion of variance accounted for and cumulative variants and so forth then you can get that by typing in alt divisor right here and uh if if you just if you're not an SBS s user you can still use the alt divisor approach as well so it doesn't really matter but if you are an SPSS user and you want it to um you know do a comparison of results you can type that in uh then finally we'll type in CIT and then uh 25 that's just basically referring to the number of iterations um uh during our uh extraction here so at this point and you'll notice that it's the line is kind of running over uh kind of that little there's like a vertical um uh line kind of in the middle of the page right here if that bothers you one thing that you can do is you can wrap the um the syntax around basically just kind of pick a point a space in the syntax and you can type in three uh forward slashes and then move the remainder of the code to the next line so essentially it's just basically it's telling the program uh telling stata that uh the um the commands are are moving on are continuing on to the next line so at this point I'm going to highlight all of this and we will go up here to execute selection and so now we've got our our iterated principle factors solution so in looking at our output you'll see that we've got uh first off we've got various eigenvalues but remember you know in terms of the retained factors right here we are retaining the first two factors because when we went through that previous step we determined that a two-factor model is the best representation of our data so we're going to use factors one and two right here so in terms of reporting be sure that you report on the eigenvalues associated with that with in this case are two factors and you'll notice that we've got proportions right here and cumulative proportions so the proportions are basically reflecting the proportion of variation accounted for in those um and those measured variables by each of the individual factors and then the cumulative proportion is the overall proportion of variance accounted for so as we're looking at this you can see that the first Factor accounts for about 41.7 percent of the variation the second one accounts for about uh 14.6 percent of the variation and then cumulatively they account for about 56.28 percent of the variation so when you're reporting uh you report on these obviously now at this point these are unre these are eigenvalues and and uh and uh for portions of variance account for and so forth associated with an un rotated solution so if you look down at the bottom here you'll see we've got it says factor loadings and unique variances down here and you've got Factor one factor two and so typically when we carry out our factor analysis houses we don't only want to determine how many factors there are but we want to give them meaning we want to give them some type of name we want to Define them and the way that we Define them is by looking at their associations with the measured variables that we have subjected to factor analysis so and you when you look at the factor loading Matrix down below these are correlations between each measured variables and the factors and so those variables that are more highly correlated with a given Factor we would use those in making you know in naming or labeling or describing that factor correlations are lower on a given Factor we don't our variables that are that that correlate lower with a given Factor we don't use in that process of naming or describing that factor now at this point these are what are referred to as unrotated factor loadings and typically um you know there you know there are issues in terms of trying to kind of make sense out of things with uh with uh unrotated loading so I'm not going to go into all the details on that but I am just going to say that typically uh we don't interpret these loadings typically what we do is we uh submit our uh our results up to this point to um to a rotation so essentially we are reorienting the relationships between the measured variables and the factors uh in Factor space and then that's going to help facilitate interpretation of those factors so we're not going to you we're not going to interpret the the factors using the unrotated loadings that you see down in this Matrix here we're going to perform a rotation in order to increase their interpretability you also see too that we have this this column right here that says uniqueness so this is basically going to be the proportion of variation in each of our measured variable that is unaccounted for by the factor solution so you could think about it as base it's basically measurement error so it's the variation that's not accounted for and often you know typically when we are reporting on our results we we don't actually just kind of report on the uniqueness mainly we we actually report on communality which is the proportion of variation that is accounted by the factors now the other thing to note too is as you're looking at uh the factor one and Factor two you'll notice that I mean with each each measured variable right here it has a loading on each factor so in a nutshell the communality is reflecting the combined uh influence or of both factors in our model on that particular variable so so in a nutshell the commonalities are reflecting the proportion of variation accounted for not just by a single Factor but by both factors because both of them are related to our measured variable so I'm going to show you in a bit about how to generate commonalities but that's kind of a general breakdown of of what's going on at this point so now what we need to do is we want to submit our our uh so our our solution up to this point to rotation we want to be able to we want to rotate the factor loadings in order to interpret our factors so there's two general classes of rotation that are utilized in factor analysis they there's one class is what's referred to as an orthogonal rotations the other class is oblique rotations so and within those classes there are different types of orthogonal rotations and oblique rotations but orthogonal rotations essentially are designed to maintain the orthogonality of the factors that have been extracted so when we carry out our factor analysis you know when we up to this point remember that you know we had Factor one we had Factor two and each of those factors uh accounted for a certain percentage of variation in our measured variables and we could actually talk about the uh the variance accounted for in a cumulative sense um and that's because each factor is uncorrelated with the other factors within the model so what that basically means then is that we can add up the proportions of variance accounted for by each factor at this point and so the basic idea when we carry out the factor analysis is that the first factor is going to account for the greatest uh proportion of variation in our measured variables then the second factor is going to be extracted such that it's uncorrelated with the first Factor but then it accounts for the next largest proportion of variation then if we continue that process each factor that's extracted is going to be orthogonal or uncorrelated with the previous factors but also is going to account for the you know subsequent proportions of variation so the the the eigenvalues and the proportion of variation accounted for generally is going to decrease as more factors are extracted but remember that basically each factor is extracted to be orthogonal with the previous factors so if we carry out an orthogonal rotation it's going to maintain that um that that idea of uncorrelated factors so all we're doing in that case is we're performing a rotation where the factors remain at uh at um at 90 degree angles basically so um so at any rate what we're going to do is we're going to use one of those orthogonal rotations probably the most commonly used orthogonal rotation is varimax rotation so at this point I'm going to kind of go in here a little bit and we're going to type in we're going to perform a varmax rotation I'm going to type in rotate comma varimax but then I'm going to type in Kaiser right here and we'll go ahead and use the alt divisor as well just so that we can we can kind of get output that's comparable to what we would see in SPSS so uh SPSS by the way by by default uses a Kaiser normalization and so that's why I'm typing that in but at any rate what we'll do at this point is we're going to highlight this and we'll run and so now when we look at our output you'll see that we get the results from our rotated solution so you'll see that in terms of the variance accounted for these uh values right here these are eigenvalues so these are the eigenvalues associated with the rotated factors and what happens following rotation it or with rotation is that the variance that's accounted for gets kind of spread out more evenly in the across our factors so the variance that we uh that's accounted for prior to rotation will generally look different from the variance that's accounted for after rotation so these are the eigenvalues that you see right there and in terms of the proportion of variance accounted for by the factors we see that the first Factor accounts for 36 percent of the variance the second one accounts for about 20 percent of the variance if you add those two together uh basically you can see that combined those factors account for about 56 percent of the variance then when we look at this Matrix down here you'll see that they're describing it as a pattern Matrix in the context of an orthogonal rotation basically the pattern Matrix and the structure Matrix are the same and I'll talk a little bit about the difference between a pattern Matrix and a structure Matrix more when we go into oblique rotation but essentially what we do is again we look at the relationships between our measured variables and the factors now typically what we what we do is we adopt a minimum loading criteria on a given Factor uh to to basically say that you know if if a measured variable uh you know Israel correlates uh at that threshold or above with a given Factor then we use that variable to help in naming or describing or defining that particular Factor so as you're looking at this so a common there are various rules of thumb out there um the the rule of thumb I typically use would be a factor loading of 0.40 or greater would be used as sort of a basically the 0.4 is a minimum criteria and that point four is in that an absolute value so because you can have factor loadings that are negative as well as positive so we'll use the absolute value of that in uh defining our factors now in our case right here we don't have any negative loadings but that's just kind of highlighting that little little point right there other common criteria might be a 0.32 serving as the minimal loading or sometimes even a point uh three as a minimum loading but we're going to stick with 0.4 so as we're looking at our uh our pattern Matrix right here you can see that these measured variables paragraph completion sentence completion and word meaning all of these meet the loading criteria on Factor one so you can see that the remaining variables right here are meant really exhibiting low loadings so what that means practically then is that we're gonna we're gonna assume or we're going to treat Factor one as being defined by these three measured variables we're going to name our Factor one using uh using the the ideas so or the uh the concepts that are being measured with those variables so we have paragraph completion sentence completion and word meaning so that would suggest to us then that factor one might be called verbal ability and so if you remember earlier on when I was talking about uh you know those two two-factor solutions and we had verbal ability in there that's where this is coming from we're naming this Factor based on those those three very high loadings and so in a nutshell the remaining uh the remaining measured variables we just treat as uh really not contributing uh to that definition so we we we're just kind of excluding those at least from a conceptual uh from a conceptual standpoint from our definition of that particular Factor then if in uh if we look at Factor two we've got our um these three variables right here visual perception cubes and lozenges right there all three of those meet the minimum loading criteria on Factor two and so uh unfortunately there's not really much definition of cubes and lozenges but that first variable right there visual perception we might treat all of these as representing uh something like spatial ability okay so under the assumption that that the variance that they share is really reflecting uh kind of a spatial ability Factor so and then in terms of those other measured variables right here you know those are loading very low on that particular Factor so the basic idea is is that we would say then that really uh based on on on this examination of our loading Matrix that we we would say that that the relationships among our paragraph completion sentence completion word completion appears to be largely a function of a verbal ability Factor as as producing the correlations among those variables and then the relationships among visual perception cubes and lozenges is largely reflecting a spatial ability Factor now obviously in our correlation Matrix we we also have the interrelations between those sets of variables but but lar in large part the relationships among those subsets of variables appears to be related to these particular factors that that we've just been talking about so then our next the next thing I want to show you though is carrying out the factor analysis or re-running our analysis but then using an a uh an oblique rotation so a common oblique rotation is Promax rotation and so what we'll do is go ahead and go back into our editor right here and uh we'll type in we'll type in rotate comma Promax and there's a kind of a little parameter right here that that that establishes how much we are allowing our variables to be correlated with each other so I'm actually going to stick with the with the the number right here that's kind of the default actually in SPSS which is four and I'll type in oblique then I'm going to type in Kaiser and then we'll type in alt divisor right here and actually I'll remove that oblique and we'll just stick with rotate comma Vermax Kaiser alt divisor here so I'm going to highlight this and click on the execute selection and so now as we're looking at our results you'll see that we've got um we we again we have eigenvalues associated with our rotated solution but in this case because we've select we've we're carrying oh excuse me that's the varimax I'm sorry I need to rerun this using Pro Mac so let me uh actually I will just copy this and I will paste it down here and instead of using fairmax we will type in Promax all right so uh we'll highlight this and execute the selection right here and so now you know looking at our results you can see it uh down here it just says um we have uh it says the variances for each of our factors but you'll notice the statement right here it says rotated factors are correlated so what that basically means though is that we can't talk about each factor in terms of accounting for uh additive amounts of variation in our measured variables because the factors are correlated with each other so we can still refer to proportions of variance accounted for right here I suppose but basically um uh we can't really think about these proportions and then in an additive sense because you know if we go back and we think about it when we ran our analysis when we ran our uh original analysis the uh iterated principal uh axis um or principal factors analysis we accounted for a cumulative proportion of variation when we use ver Max rotation we were still accounting for the same proportion of variation but we could break down that uh that proportion of variation and the measured variables uh into variation that's accounted for uh distinctly or uniquely by each factor when we use an ortho an oblique rotation however we're still ultimately accounting for the same overall proportion of variation but now because the factors are correlated we can't assign that variation to you know Factor one and Factor two and treat the variance accounted for as additive so that you know that's something to kind of keep in mind uh it when you're reporting on your results so at any rate the variance basically the eigenvalue for Factor one is 2.38 the eigenvalue for Factor 2 is 1.426 now the we have down below we've got our loading Matrix and it's it's a pattern Matrix down here now unlike the Matrix that we saw above the you know prior to rotation where it said pattern Matrix um and also with our uh with our varimax rotation this Matrix now uh cannot be interpreted as zero order correlations between each measured variable and uh the factors and the reason why is because the factors are correlated with each other so each loading it's probably better to think about them as more akin to standardized regression coefficients than uh then zero order correlations between the measured variables and the factors because the loadings themselves are reflecting a partialing of the you know if I'm looking at the relationship between let's say visual perception and Factor one right here we're looking at that relationship while partialing out uh Factor two from that Association so if we look at visual perception and Factor two this coefficient right here reflects a partialing of factor one from the association between Factor two and visual perception so these again these are not uh correlations between each measured variable and the factors these are more akin to standardized factor loadings or standardized partial excuse me standardized partial regression coefficients so there's a lot to kind of keep in mind as you're going through all this so we can still use our loading criteria and making our uh our judgment but we need to kind of keep that in mind uh in terms of what these loadings are actually uh capturing so if we go and we look at if we still use the 0.4 loading criteria we still see these coefficients right here all of these um all you know our paragraph completion sense completion word meaning variables all of those meet the minimum loading criteria and then these variables right here visual perception cubes and lots and just meet the minimum loading criteria as well the other thing to kind of note too as we're going through these matrices is that they're not always going to necessarily be as clean and clear as what you see right here sometimes it will be the case where you'll have loadings like you'll have a a measured variable that meets the minimum load loading criteria on multiple factors in that particular case what you have is a is a little bit more ambiguity in terms of making a determination uh about you know which factor that that variable represents um does it even make sense to include that variable uh in the naming or defining of the factors if you know if it's related to both if it meets the minimum loading criteria on both so that's where things get a little bit more complicated but in the current case right here it's very uh you know it's very clean and clear about which variables are related to those factors so if you want to further kind of unpack the relationships between the variables and their um and these factors then you can also generate a structure Matrix and a structure Matrix does contain the correlations the zero order correlations between each of the measured variables and those latent factors now it's not recommended that you use the the structure Matrix exclusively when you're naming or defining those factors because though it ignores the relationships between the factors and those measured variables but you can use it as sort of a supplement when you are making this judgment about kind of the overall thematic content associated with a factor so in order to generate a structure Matrix what we'll do is we'll use as a post estimation command we'll type in e-stat and then we'll type in structure and so this is a follow-up to our our Promax rotation uh output and so forth so we'll click on execute selection and now we have a structure Matrix so these are the zero order correlations between each variable and the factors so in this particular case right here we see basically a similar pattern emerged in terms of the correlations here between our measured variables and those respective those factors but you can also see some of the other associations look a little bit larger in terms of those other variables in relation to um in terms of the uh the other factors and that's basically because these are zero order correlations so we're not partially out um you know relationships involving we're not personally out the other Factor when we're looking at the relationship between a given measured variable and that particular Factor but again you can kind of use this Matrix as additional information to um uh to kind of uh to talk about the kind of the Thematic content of a factor I will tell you too that when I ran this analysis right here um I didn't uh specify that uh that Kappa parameter there so to keep our results again kind of consistent with SPSS I'm just going to type in Promax with a 4 inside the parenthesis there and so when I run the analysis right here you can see that uh basically the output is reflecting uh that particular specification so there you go the next piece that I want to show you is that if you want to generate correlations among the factors there's another post-estimation command as well which is estat amen okay so when I highlight this and run it then I end up with the correlation between those two factors which is 0.3718 so that's something else that you are likely going to want to uh report uh when you are when you're writing up your results involving um factor analysis or at least following um an oblique rotation now the last thing I want to show you is how to generate communality so by default uh s stata does not generate commonalities for your measured variables and as I said earlier communalities reflect the proportion of variation in our measured variables that's accounted for by this the full set of our retained factors so in order to generate those you can use the following syntax right here so again I'm going to go into this book and this syntax was also provided by Watkins in the book uh on the step-by-step guide to exploratory factor analysis with stata so I'm going to type in Matrix and then M and then equals and then I'll type in J 1 comma c-o-l-s-o-f and then e PSI with a capital P by the way then so you can see it's very easy to to make a mistake in here so we've got that we have we'll type that in right there we'll type in a 1 right here and then minus E and then PSI so what's happening in this case is that essentially we're kind of we're pulling inform uh Matrix information out of some of our previous estimates and then we are Computing the communality so at this point I can type in Matrix list and M and highlight it and uh we'll execute the selection so now you can see we've got commonalities associated with each of our measured variables right here so I will tell you that um you know under the the video description I do have a PowerPoint and I also include um you know just an example uh do file so you can kind of study up on this syntax a little bit more closely but that's going to basically wrap up this video discussion I know it's a lot to kind of go through but as you can see factor analysis is not something that is um you know where you can just kind of click a few buttons and understand what you're doing it does require thinking through this strategically and also as I kind of went through this whole process I wanted to make sure that uh to kind of show you our model to you kind of a general strategy for setting up a do file that you could use not only for the current analysis but perhaps you could save it and use it for other analyzes just by kind of switching out um you know the variable names and so forth so at any rate that's going to conclude this video and I really appreciate you watching

Transcript for:Understanding Exploratory Factor Analysis in Stata

Transcript for:
Understanding Exploratory Factor Analysis in Stata