Understanding Factor Analysis and Its Applications

Hi, this is Farooq Hashmi from thinkingneuron.com. In this video, I'm going to help you understand another very famous dimension reduction technique, which is called factor analysis. We will begin by understanding what is dimension reduction, where do we use it, then we'll list out some popular algorithms for dimension reduction, and then we'll dive into the factor analysis mathematics. We'll understand how factor analysis works exactly, and a common confusion which must be addressed. the factor analysis versus pca because people often confuse between these two algorithms because their workings are quite similar so we will draw out the comparison like what is factor analysis doing and what is pca doing all right so let's get started so what is dimension reduction so this is an unsupervised machine learning technique this technique helps us to reduce large number of columns into small number of columns. For example, let's say you have a data set which you processed, explored and finally after doing all the correlation analysis and converting into dummy variables, you reached at this level where you have one target variable and there are multiple predictors. Predictor 1, predictor 2, predictor 3 so on and so forth till predicted 500. So there are 500 columns in your data and one target variable. But hold on, you just said that it is an unsupervised machine learning technique. So how come this target variable is coming into picture? So this target variable doesn't come into picture when you do dimension reduction. So over here the problem which you will face in using this data to fit your machine learning models is you need to take into account all the 500 columns. Okay. And when you take 500 columns into account, this kind of data or more, let's say you have 1000 columns or 2000 columns, these kinds of scenarios are very common in text mining. So whenever you are doing natural language processing, you are trying to create a model, let's say for ticket classification. So the ticket description data needs to be converted into numeric form and then each word will represent one column also known as the document term matrix. In such kind of data sets, 1000 columns or 2000 columns of predictors is very common. So in such scenarios, let's say if you use that data to try to fit through a machine learning algorithm, it will take large amount of time and it will be painfully slow. So what you can do is instead of fitting these 500 columns all together is there a way in which you can shrink these 500 columns which you have 500 predictors i am shrinking the number of predictors only with some smaller number of columns let's say let me call these columns the very famous principal components let's say principal component 1 principal component 2 principal component 3 and principal component 4. so i am representing These columns, these 500 columns with just 4 columns or maybe 1000 columns with 4 columns or 5 columns or 10 columns. How do we decide these columns? It is based on the algorithm to algorithm. Okay, so we will discuss that going further but this is the overall idea. I am doing some work only on the predictors data. So, I am shrinking the predictors representing large number of predictors with smaller number of predictors. How many predictors should I choose to represent these 500 predictors? This is also something which varies from algorithm to algorithm which we'll discuss further. Okay, so once I have given this representation of 500 columns with just 400 columns, now I can bring in the target variable, create this new data. So now this new data is a representation of the old data which was having high number of dimensions. Dimension means what? Columns. So these are nothing but dimensions. Every column represent one dimension. So if there are so many dimensions in my data because of x, y, z reasons, I can somehow represent it with smaller number of dimensions or smaller number of columns. And these dimensions could sometimes be called as factors and sometimes called as principal components, etc. So different algorithms will name it differently the way they combine it. Okay. So Once you do this, now you have a new data and on this data you can fit your machine learning algorithms. Supervised Machine Learning Algorithms Because let's say the target variable is there, then definitely either you are doing regression or you are doing classification. So whatever is the scenario, if you are trying to predict something which is the number, then you will fit supervised regression machine learning. Or if it is a class, then supervised classification machine learning depending on the target variable. Okay. But the crux is your predictors are now shrinked to a smaller set of columns. Okay. The whole journey of how these predictors are shrinked to smaller set of columns. This is something what we call as different. This is something which is very different for different algorithm. For example, in factor analysis, it will happen differently. In principal component analysis, it will happen differently. Or independent component analysis, it will happen differently, so on and so forth. So what is the need for dimension reduction? Why do we require this activity to be performed? Because if you choose all the predictors then your model training on this old data or the raw data the model training will be very slow so if your data has high number of dimensions high dimensionality what we call if your data is suffering from high dimensional high dimensionality or too many call too many predictors because target variable is always one so if this is the issue what it will impact on your machine learning training is first of all it will be slow and this is no-brainer if you have more columns in order to learn more columns in order to find the best column out of it or trying to find the equation of all the columns you will have you will have to do more computations and if you have less columns then you will have to do lesser computations okay and your algorithm will run fast so once you convert your data into the reduced dimensions or a smaller version or basically a compressed version of the overall data set then you will be able to train your models very very fast and due to this slow model training when your model is created the model is also complex which will also impact the time of predictions so all these issues will come up if you have very high dimensionality in your data set And that can be resolved using dimension reduction. Alright. So to perform this, there are many algorithms which are there. And I'll quickly describe them in the upcoming section. Now let us discuss when do we use dimension reduction exactly. So we use it under supervised machine learning, under unsupervised machine learning and for data visualization. These are the three use cases where you can use dimension reduction. And every time the role of dimension reduction would be. to reduce the number of columns so that I can do a better job. For example, in supervised machine learning, when exactly do you employ the dimension reduction technique? So in supervised machine learning, you have done your exploratory data analysis, you have done your correlation analysis, you have selected the feature and then you have converted the columns, the categorical columns or the string columns into number. So often what will happen there, there will be certain columns which have too many unique values, especially the string columns and when you do get dummies, the number of variables will increase drastically after you do get dummies. So, when each and every unique value becomes a predictor and then you have your target variable. So, obviously it is high dimensional and the data has inflated. So, in this scenario, you can choose to shrink the predictors. So, when do we perform this activity? Only after all these things are done. and you are at the verge of applying machine learning. So just before that you take a call that there are too many predictors in the data. So let me try to represent them with smaller number of predictors okay using dimension reduction techniques. Similarly in unsupervised machine learning after you are done converting the data into numeric. After this if you feel that there are too many columns and it will slow down the training process of any other unsupervised machine learning like especially clustering. Let's say clustering. So in clustering before doing clustering you can take a call that there are too many columns. So instead of doing clustering on too many columns let me shrink this data and do clustering on smaller set of columns. Like instead of doing clustering on all these variables Let me do the clustering only on these few variables. Okay, I'm just ruling out the target variable from the picture right now. Similarly, for data visualization. So if your data has too many columns, you cannot plot the n-dimensional graph for all these columns. So it will be too computationally intensive as well as hectic to code. So if you still want to represent the data patterns or the data. variations of all these columns using smaller set of variables. So all you have to do is just choose the first two principal components or first two factors which are really important. Okay so instead of using let's say 500 columns all you have to do is just choose these two predict these two factors which you have just generated or these two principal components which you have generated and then based on this you can create a chart which is two dimensional. So principal component one and principal component two, let's say, for example. So data can be visualized, its grouping can be studied and clustering can be observed. Okay, so when you plot a multi-dimensional data in lower dimensions or just two dimensions, you don't just blindly choose any two columns. What you are doing is you are choosing a combination of such column because what PCA or what factor analysis tries to do is to represent multiple such columns as a single column okay so there might be redundant information in multiple columns which can be clubbed together and represented as one single principal component or one simple factor okay so combining multiple columns together i will get my single column or single component or single factor this is what dimension reduction performs for you okay now to perform this what are the algorithms you do we have in place? So we have algorithms like factor analysis, principal component analysis, famously known as PCA, independent component analysis or ICA, the T-SNE, T-stochastic neighbor embedding and last one is UMAP which is uniform many fold approximation and projection. So these two primarily are intended towards data visualization. but can also be used for supervised machine learning and unsupervised machine learning cases like clustering or let's say some kind of some form of natural language processing you are doing before doing classification or regression okay so in those scenarios also you can use this but the most famous of them all is principal component analysis so now let us try to understand how factor analysis works so factor analysis tries to find out the common common factor hidden behind these individual columns which is driving them. For example, this is the result of the employee satisfaction survey. There are numbers assigned in the ratings column, the average complaints, the privileges, learning, raises, critical and advance. So these are the different parameters where the employees have rated the organization. and there must be certain hidden factors which are driving the ratings. So if I show you the end result, for example, the work culture of an organization collectively affects how you will rate the overall, let's say the rating, how the ratings are handled for the employees, how the complaints are handled, how the privileges are given in this organization. So all of this is a part of a collective factor, you can say. It's called work culture. Similarly, what kind of learning opportunities you are getting it's summarized based on only one single column and the other three columns are collectively driven by promotions so this is an underlying factor which affects the ratings from the employees based on raises the critical feedback or or the kind of advancements they are doing inside the organizations so these three groups which are being created over here. These three groups are basically the driving factors. These are the driving factors. So if the work culture of an organization is good, then the employees will rate higher in this first column rating, the complaints and the privileges. All these three columns will get higher ratings in general if the work culture is good. If there are learning opportunities, then this will get higher rating. If the promotions and the career graph has been good so far in this organization for the employees, then raises, critical feedback and advancements, all of these things will get higher scores or if it is bad, it will get lower scores. So if we look at the data, we are looking at individual columns. Individual columns are saying something to us, but what factor is hiding behind these columns for rating to receive a 43 average or 63 average or 71 average? What is the driving force? What is the common cause? So that common cause is nothing but factor. Okay. And this is what we try to do in factor analysis. We try to find the hidden factors which are driving a couple of columns to behave in a certain manner to get the values in a certain manner. Right. So factor analysis is also sometimes known as exploratory factor analysis wherein you try to understand what groups of columns are behaving together or you can say they are connected to each other in some way. So this is the hidden factor which connects all these columns together. So in an unknown data, finding out these factors, these hidden factors is nothing but factor analysis. All right. So the kind of equations factor analysis creates is slightly different from the kind of equations principal component analysis creates. For example, over here, the equations would be something like this. Let's say rating would be equal to some alpha times work culture plus c then complaints some beta times let's call the c1 beta times the work culture plus c2. So these are the equations which are formed during factor analysis and this algorithm helps us to find out the values of alpha beta c1 and c2 so that we get the values of our factors. Now in principal component analysis the equations which are formed look slightly different than this. So the equations which are formed under principal component analysis looks like this. It begins by the factors first. So the principal component 1 let's say the work culture is principal component 1 would be some something like alpha 1 times rating plus alpha 2 times complaints plus alpha 3 times privileges. plus alpha 4 times learning so on and so forth. And now let's say if I have to combine rating complaints and privileges together to form principal component one then I'll increase these coefficients and decrease the other coefficients values like it will be given like 0.7, 0.2, 0.3 and the rest of them would be like 0.001, 0.002 very small values for others. So this is the kind of equation you get in principal component analysis and And this is the kind of equation you get in factor analysis. So there is a very basic difference between factor analysis and principal component analysis. Principal component analysis tries to explain most of the variance by using, try to explain it, try to explain most of data variance by few columns, what we call principal components. And factor analysis is basically trying to find out the hidden factors. the hidden driving forces of the values of these columns. Why these values were observed? Because of these factors were at play. Okay, so the basic you can say the thinking process behind factor analysis and PCA is different but the end goal is common. We try to represent the whole data with smaller set of columns, smaller set of groups of values. Okay, so that is the common link between factor analysis and PCA. That is why often they are confused with each other. But based on this equation, you will be able to draw the distinction between these two techniques. All right. So I hope a factor analysis is clear to you and you will be able to explain this in your interview with confidence. So all the best for that. And I hope you crack it.

We'll understand how factor analysis works exactly, and a common confusion which must be addressed. the factor analysis versus pca because people often confuse between these two algorithms because their workings are quite similar so we will draw out the comparison like what is factor analysis doing and what is pca doing all right so let's get started so what is dimension reduction so this is an unsupervised machine learning technique this technique helps us to reduce large number of columns into small number of columns. For example, let's say you have a data set which you processed, explored and finally after doing all the correlation analysis and converting into dummy variables, you reached at this level where you have one target variable and there are multiple predictors. Predictor 1, predictor 2, predictor 3 so on and so forth till predicted 500. So there are 500 columns in your data and one target variable. But hold on, you just said that it is an unsupervised machine learning technique.

So how come this target variable is coming into picture? So this target variable doesn't come into picture when you do dimension reduction. So over here the problem which you will face in using this data to fit your machine learning models is you need to take into account all the 500 columns. Okay. And when you take 500 columns into account, this kind of data or more, let's say you have 1000 columns or 2000 columns, these kinds of scenarios are very common in text mining.

So whenever you are doing natural language processing, you are trying to create a model, let's say for ticket classification. So the ticket description data needs to be converted into numeric form and then each word will represent one column also known as the document term matrix. In such kind of data sets, 1000 columns or 2000 columns of predictors is very common.

So in such scenarios, let's say if you use that data to try to fit through a machine learning algorithm, it will take large amount of time and it will be painfully slow. So what you can do is instead of fitting these 500 columns all together is there a way in which you can shrink these 500 columns which you have 500 predictors i am shrinking the number of predictors only with some smaller number of columns let's say let me call these columns the very famous principal components let's say principal component 1 principal component 2 principal component 3 and principal component 4. so i am representing These columns, these 500 columns with just 4 columns or maybe 1000 columns with 4 columns or 5 columns or 10 columns. How do we decide these columns? It is based on the algorithm to algorithm.

Okay, so we will discuss that going further but this is the overall idea. I am doing some work only on the predictors data. So, I am shrinking the predictors representing large number of predictors with smaller number of predictors. How many predictors should I choose to represent these 500 predictors?

This is also something which varies from algorithm to algorithm which we'll discuss further. Okay, so once I have given this representation of 500 columns with just 400 columns, now I can bring in the target variable, create this new data. So now this new data is a representation of the old data which was having high number of dimensions. Dimension means what? Columns.

So these are nothing but dimensions. Every column represent one dimension. So if there are so many dimensions in my data because of x, y, z reasons, I can somehow represent it with smaller number of dimensions or smaller number of columns. And these dimensions could sometimes be called as factors and sometimes called as principal components, etc. So different algorithms will name it differently the way they combine it.

Okay. So Once you do this, now you have a new data and on this data you can fit your machine learning algorithms. Supervised Machine Learning Algorithms Because let's say the target variable is there, then definitely either you are doing regression or you are doing classification. So whatever is the scenario, if you are trying to predict something which is the number, then you will fit supervised regression machine learning.

Or if it is a class, then supervised classification machine learning depending on the target variable. Okay. But the crux is your predictors are now shrinked to a smaller set of columns. Okay.

The whole journey of how these predictors are shrinked to smaller set of columns. This is something what we call as different. This is something which is very different for different algorithm.

For example, in factor analysis, it will happen differently. In principal component analysis, it will happen differently. Or independent component analysis, it will happen differently, so on and so forth. So what is the need for dimension reduction? Why do we require this activity to be performed?

Because if you choose all the predictors then your model training on this old data or the raw data the model training will be very slow so if your data has high number of dimensions high dimensionality what we call if your data is suffering from high dimensional high dimensionality or too many call too many predictors because target variable is always one so if this is the issue what it will impact on your machine learning training is first of all it will be slow and this is no-brainer if you have more columns in order to learn more columns in order to find the best column out of it or trying to find the equation of all the columns you will have you will have to do more computations and if you have less columns then you will have to do lesser computations okay and your algorithm will run fast so once you convert your data into the reduced dimensions or a smaller version or basically a compressed version of the overall data set then you will be able to train your models very very fast and due to this slow model training when your model is created the model is also complex which will also impact the time of predictions so all these issues will come up if you have very high dimensionality in your data set And that can be resolved using dimension reduction. Alright. So to perform this, there are many algorithms which are there.

And I'll quickly describe them in the upcoming section. Now let us discuss when do we use dimension reduction exactly. So we use it under supervised machine learning, under unsupervised machine learning and for data visualization. These are the three use cases where you can use dimension reduction.

And every time the role of dimension reduction would be. to reduce the number of columns so that I can do a better job. For example, in supervised machine learning, when exactly do you employ the dimension reduction technique? So in supervised machine learning, you have done your exploratory data analysis, you have done your correlation analysis, you have selected the feature and then you have converted the columns, the categorical columns or the string columns into number. So often what will happen there, there will be certain columns which have too many unique values, especially the string columns and when you do get dummies, the number of variables will increase drastically after you do get dummies.

So, when each and every unique value becomes a predictor and then you have your target variable. So, obviously it is high dimensional and the data has inflated. So, in this scenario, you can choose to shrink the predictors.

So, when do we perform this activity? Only after all these things are done. and you are at the verge of applying machine learning.

So just before that you take a call that there are too many predictors in the data. So let me try to represent them with smaller number of predictors okay using dimension reduction techniques. Similarly in unsupervised machine learning after you are done converting the data into numeric.

After this if you feel that there are too many columns and it will slow down the training process of any other unsupervised machine learning like especially clustering. Let's say clustering. So in clustering before doing clustering you can take a call that there are too many columns.

So instead of doing clustering on too many columns let me shrink this data and do clustering on smaller set of columns. Like instead of doing clustering on all these variables Let me do the clustering only on these few variables. Okay, I'm just ruling out the target variable from the picture right now.

Similarly, for data visualization. So if your data has too many columns, you cannot plot the n-dimensional graph for all these columns. So it will be too computationally intensive as well as hectic to code.

So if you still want to represent the data patterns or the data. variations of all these columns using smaller set of variables. So all you have to do is just choose the first two principal components or first two factors which are really important.

Okay so instead of using let's say 500 columns all you have to do is just choose these two predict these two factors which you have just generated or these two principal components which you have generated and then based on this you can create a chart which is two dimensional. So principal component one and principal component two, let's say, for example. So data can be visualized, its grouping can be studied and clustering can be observed. Okay, so when you plot a multi-dimensional data in lower dimensions or just two dimensions, you don't just blindly choose any two columns. What you are doing is you are choosing a combination of such column because what PCA or what factor analysis tries to do is to represent multiple such columns as a single column okay so there might be redundant information in multiple columns which can be clubbed together and represented as one single principal component or one simple factor okay so combining multiple columns together i will get my single column or single component or single factor this is what dimension reduction performs for you okay now to perform this what are the algorithms you do we have in place?

So we have algorithms like factor analysis, principal component analysis, famously known as PCA, independent component analysis or ICA, the T-SNE, T-stochastic neighbor embedding and last one is UMAP which is uniform many fold approximation and projection. So these two primarily are intended towards data visualization. but can also be used for supervised machine learning and unsupervised machine learning cases like clustering or let's say some kind of some form of natural language processing you are doing before doing classification or regression okay so in those scenarios also you can use this but the most famous of them all is principal component analysis so now let us try to understand how factor analysis works so factor analysis tries to find out the common common factor hidden behind these individual columns which is driving them. For example, this is the result of the employee satisfaction survey.

There are numbers assigned in the ratings column, the average complaints, the privileges, learning, raises, critical and advance. So these are the different parameters where the employees have rated the organization. and there must be certain hidden factors which are driving the ratings.

So if I show you the end result, for example, the work culture of an organization collectively affects how you will rate the overall, let's say the rating, how the ratings are handled for the employees, how the complaints are handled, how the privileges are given in this organization. So all of this is a part of a collective factor, you can say. It's called work culture.

Similarly, what kind of learning opportunities you are getting it's summarized based on only one single column and the other three columns are collectively driven by promotions so this is an underlying factor which affects the ratings from the employees based on raises the critical feedback or or the kind of advancements they are doing inside the organizations so these three groups which are being created over here. These three groups are basically the driving factors. These are the driving factors.

So if the work culture of an organization is good, then the employees will rate higher in this first column rating, the complaints and the privileges. All these three columns will get higher ratings in general if the work culture is good. If there are learning opportunities, then this will get higher rating.

If the promotions and the career graph has been good so far in this organization for the employees, then raises, critical feedback and advancements, all of these things will get higher scores or if it is bad, it will get lower scores. So if we look at the data, we are looking at individual columns. Individual columns are saying something to us, but what factor is hiding behind these columns for rating to receive a 43 average or 63 average or 71 average?

What is the driving force? What is the common cause? So that common cause is nothing but factor.

Okay. And this is what we try to do in factor analysis. We try to find the hidden factors which are driving a couple of columns to behave in a certain manner to get the values in a certain manner. Right.

So factor analysis is also sometimes known as exploratory factor analysis wherein you try to understand what groups of columns are behaving together or you can say they are connected to each other in some way. So this is the hidden factor which connects all these columns together. So in an unknown data, finding out these factors, these hidden factors is nothing but factor analysis. All right.

So the kind of equations factor analysis creates is slightly different from the kind of equations principal component analysis creates. For example, over here, the equations would be something like this. Let's say rating would be equal to some alpha times work culture plus c then complaints some beta times let's call the c1 beta times the work culture plus c2.

So these are the equations which are formed during factor analysis and this algorithm helps us to find out the values of alpha beta c1 and c2 so that we get the values of our factors. Now in principal component analysis the equations which are formed look slightly different than this. So the equations which are formed under principal component analysis looks like this.

It begins by the factors first. So the principal component 1 let's say the work culture is principal component 1 would be some something like alpha 1 times rating plus alpha 2 times complaints plus alpha 3 times privileges. plus alpha 4 times learning so on and so forth.

And now let's say if I have to combine rating complaints and privileges together to form principal component one then I'll increase these coefficients and decrease the other coefficients values like it will be given like 0.7, 0.2, 0.3 and the rest of them would be like 0.001, 0.002 very small values for others. So this is the kind of equation you get in principal component analysis and And this is the kind of equation you get in factor analysis. So there is a very basic difference between factor analysis and principal component analysis.

Principal component analysis tries to explain most of the variance by using, try to explain it, try to explain most of data variance by few columns, what we call principal components. And factor analysis is basically trying to find out the hidden factors. the hidden driving forces of the values of these columns.

Why these values were observed? Because of these factors were at play. Okay, so the basic you can say the thinking process behind factor analysis and PCA is different but the end goal is common.

We try to represent the whole data with smaller set of columns, smaller set of groups of values. Okay, so that is the common link between factor analysis and PCA. That is why often they are confused with each other.

But based on this equation, you will be able to draw the distinction between these two techniques. All right. So I hope a factor analysis is clear to you and you will be able to explain this in your interview with confidence.

So all the best for that. And I hope you crack it.

Transcript for:Understanding Factor Analysis and Its Applications

Transcript for:
Understanding Factor Analysis and Its Applications