Generalization is an essential component of the wider scientific process. In an ideal world, to test a hypothesis, you would sample an entire population. It is what allows researchers to take what they have learnt on a small scale and relate it more broadly to the bigger picture.
In a 2010 Behavioral and Brain Sciences paper titled “The weirdest people in the world?”, researchers found that most psychology research is primarily done on WEIRD people, i.e. those who are “Western, Educated, Industrialized, Rich and Democratic.”
According to The New York Times,
“… 68 percent of research subjects in a sample of hundreds of studies in leading psychology journals came from the United States, and 96 percent from Western industrialized nations. Of the American subjects, 67 percent were undergraduates studying psychology — making a randomly selected American undergraduate 4,000 times likelier to be a subject than a random non-Westerner.”
The answer lies in the fact that research conducted on WEIRD people is not generalizable.
In an ideal world, to test a hypothesis, you would sample an entire population. You would use every possible variation of an independent variable [4], and be ready to measure every possible dependent variable to get a very accurate understanding of the topic at hand. But in the vast majority of cases, this is not feasible, so a representative group is chosen to reflect the whole population.
The representative group allows the researchers to go from specific observations and make inferences about broader trends or patterns, i.e. it allows them to generalize.
Few researchers can conduct research on every member of a population. But what they can do is construct a “mini-population” which is as similar to the population as possible.
Population: The entire set of possible measurements.
Sample: A smaller selection of items from that set.
For any experiment [5], you need to consider the representativeness of your sample [6], the effects of time and your sample size.
You must ensure that the sample group is as truly representative of the whole population as possible.
For many experiments, time is critical as the behaviors can change yearly, monthly or even by the hour.
The size of the group must allow the statistics [7] to be safely extrapolated to an entire population. A group that is too small may not accurately capture the variation in the broader population.
While there are some regional large scale-studies such as the HUNT-study [8] or the Decode Genetics of Iceland-study, in reality it is usually not possible to sample the whole population, due to budget, time limits and feasibility.
For example, you may want to test a hypothesis [9] about the effect of an educational program on schoolchildren in the US.
For the perfect experiment [10], you would test every single child in the US using the program, against a control group [11]. If this number runs into the millions, this may not be possible without a huge number of researchers and a bottomless pit of money.
Thus, in order to generalize you need to select a sample group that is representative of the whole population.
A high budget research project might take a smaller sample from every school in the country; a lower budget operation may have to concentrate upon one city or even a single school.
The key to generalization is to understand how much your results can be applied backwards to represent the group of US children, as a whole. The first example, using every school, would be strongly representative, because the range and number of samples is high. These samples more closely resembles the population they are trying to study. Testing only one school makes generalization difficult and affects the external validity [12].
You might find that one school generates worse than average results for children using that particular educational program. However, a school in the next town might contain children who do better. The students may be from a completely different socioeconomic background or culture.
Critics of your results [13] will pounce upon such discrepancies and question your entire experimental design [10]. At best, you can now only generalize to that particular school, and cannot legitimately make any conclusions about all US school children.
Representativeness is not just about the qualities of the population, but those qualities at a particular time.
Large sample groups could be gathered from schools all across the US. But if one sample is tested at the beginning of the year and the other group at the end, the groups now differ from one another. Perhaps the latter children perform better simply because they are now slightly older than the former.
Good experiments consider the element of time and design research to minimize its effects.
The smaller a sample gets, the less likely it is to be representative. For example, let’s say that 1% of all US children would have done extremely well on the educational program. This is 1 in 100 children. If your sample size happens to only be 60 children, your sample may not contain a child who is likely to do very well in the program. Your sample is less representative because it is smaller. Your results will be different depending on whether your sample contains one of these children or not.
Most statistical tests [7] contain an inbuilt mechanism to take into account sample sizes with larger groups and numbers, leading to results that are more significant [14].
The problem is that they cannot distinguish the validity [15] of the results, and determine whether your generalization systems are correct. This is something that must be taken into account when generating a hypothesis [16] and designing the experiment.
The other option, if the sample groups are small, is to use proximal similarity and restrict your generalization. This is where you accept that a limited sample group cannot represent all of the population.
If you sampled children from one town, it is dangerous to assume that it represents all children. It is, however, reasonable to assume that the results should apply to a similar sized town with a similar socioeconomic class. This is not perfect, but certainly contains more external validity [12] and would be an acceptable generalization.
Links
[1] https://explorable.com/what-is-generalization
[2] https://explorable.com/users/martyn
[3] https://explorable.com/users/Lyndsay%20T%20Wilson
[4] https://explorable.com/independent-variable
[5] https://explorable.com/conducting-an-experiment
[6] https://explorable.com/what-is-sampling
[7] https://explorable.com/statistics-tutorial
[8] http://www.ntnu.edu/research/research_excellence/hunt
[9] https://explorable.com/hypothesis-testing
[10] https://explorable.com/design-of-experiment
[11] https://explorable.com/scientific-control-group
[12] https://explorable.com/external-validity
[13] https://explorable.com/statistically-significant-results
[14] https://explorable.com/significance-test
[15] https://explorable.com/types-of-validity
[16] https://explorable.com/research-hypothesis