Many statistical methods are concerned with the relationship between independent and dependent variables. But factor analysis goes a step further: it's a way to understand how the patterns of relationship between several manifest variables are caused by a smaller number of latent variables, according to their common aspects. These hidden variables are called factors.
Factor analysis began with psychologist Charles Spearman around a century ago. He noticed the huge variety of measures for cognitive acuity - visuo-spatial skill, artistic abilities, reasoning etc. - and wondered if one general, underlying intelligence variable (which he called g) could explain them all.
Though he wasn't quite right (we now understand intelligence to be composed of three main factors: mathematical, verbal and logical) scientists still use his methods today. The approach involves finding a way of reducing correlated [4] variables to a smaller, independent set of derived variables, with minimum loss of information. Factor analysis is therefore a data condensation tool which removes redundancy or duplication from a set of correlated variables.
Remember that this method requires the data to be correlated, so all assumptions that apply to correlation are relevant here.
There are two main types of factor analysis:
Principal component analysis - this method provides a unique solution so that the original data can be reconstructed from the results. Thus, this method not only provides a solution but also works the other way round, i.e. provides data from the solution. The solution generated includes less than or as many factors as there are variables.
Common factor analysis - this technique uses an estimate of common difference or variance [5] among the original variables to generate the solution. The number of factors will always be less than the number of original factors. So, "factor analysis" commonly refers to common factor analysis.
Factor analysis can be used in two key ways:
Identification of underlying factors
This is also called exploratory factor analysis. You may be presented with a huge data set and have no clue about its underlying structure or the various dimensions hidden within it. Factor analysis will allow you to identify the aspects common to those variables so they can be clustered into more manageable, homogeneous sets. Thus, new sets of variables can be created. Many psychological batteries and tests are developed in precisely this way.
Screening of variables
This is also called confirmatory factory analysis. This approach helps us to identify groupings so when we select one variable to represent many, we can be confident that it is an accurate representation of the larger set.
Suppose we want to develop a test that will allow a company to select for applicants that are good team members. How would we go about it? Let's say a psychologist conducts an exploratory factor analysis on the company's requirements and discovers 20 different aspects or characteristics that make a good team member (for example "empathy"and "politeness").
Further factor analysis and testing on small samples reveals, however, that all 20 aspects are merely the manifestations of just three main factors: communication skills, conscientiousness and extroversion. The psychologist can conduct further rounds of factor analysis, testing and refinement to find answers to two main questions:
- What is the minimum number of factors needed to explain all the variation we see in the company's data?
- How well do these factors describe ALL the data?
Eventually the psychologist can arrive at the main hidden factors in the data and design the inventory accordingly.