Statistical variance gives a measure of how the data distributes itself about the mean or expected value. Unlike range that only looks at the extremes, the variance looks at all the data points and then determines their distribution.
In many cases of statistics and experimentation, it is the variance that gives invaluable information about the data distribution.
The mathematical formula to calculate the variance is given by:
σ2 = variance
∑ (X - µ)2 = The sum of (X - µ)2 for all datapoints
X = individual data points
µ = mean of the population
N = number of data points
This means the square of the variance is given by the average of the squares of difference between the data points and the mean.
For example, suppose you want to find the variance of scores on a test. Suppose the scores are 67, 72, 85, 93 and 98.
First, write down the formula for variance:
σ2 = ∑ (x - µ)2 / N
Next, there are five scores in total, so N = 5.
σ2 = ∑ (x - µ)2 / 5
Calculate the mean (µ) for the five scores: 67 + 72 + 85 + 93 + 98 / 5, so µ = 83.
σ2 = ∑ (x - 83)2 / 5
Now, compare each score (x = 67, 72, 85, 93, 98) to the mean (µ = 83)
σ2 = [ (67 - 83)2+(72 - 83)2+(85 - 83)2+(93 - 83)2+(98 - 83)2 ] / 5
Conduct the subtraction in each paranthesis.
67 - 83 = -16
72 - 83 = -11
85 - 83 = 2
93 - 83 = 10
98 - 83 = 15
The formula will now look like this:
σ2 = [ (-16)2+(-11)2+(2)2+(10)2+(15)2] / 5
Then, square each paranthesis. We get 256, 121, 4, 100 and 225.
This is how:
σ2 = [ (-16)x(-16)+(-11)x(-11)+(2)x(2)+(10)x(10)+(15)x(15)] / 5
σ2 = [16x16 + 11x11 + 2x2 + 10x10 + 15x15] / 5
which equals:
σ2 = [256 + 121 + 4 + 100 + 225] / 5
7Then summarize the numbers inside the brackets:
σ2 = 706 / 5
To get the final answer, we divide the sum by 5 (Because it was a total of five scores). This is the final variance for the dataset:
σ2 = 141.2
This is the variance of the population of scores.
In many cases, instead of a population, we deal with samples.
In this case, we need to slightly change the formula for variance to:
S2 = the variance of the sample.
Note that the denominator is one less than the sample size in this case.
The concept of variance can be extended to continuous data sets too. In that case, instead of summing up the individual differences from the mean, we need to integrate them. This approach is also useful when the number of data points is very large, for example the population of a country.
Variance is extensively used in probability theory, where from a given smaller sample set, more generalized conclusions need to be drawn. This is because variance gives us an idea about the distribution of data around the mean, and thus from this distribution, we can work out where we can expect an unknown data point.
Siddharth Kalla, Lyndsay T Wilson (Mar 15, 2009). Statistical Variance. Retrieved Sep 14, 2024 from Explorable.com: https://explorable.com/statistical-variance
The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0).
This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.
That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).