Variance is a statistical measure that quantifies the dispersion or spread of a set of data points. It provides a numerical value that indicates how much the data points in a dataset differ from the mean (average) of the dataset.
Mathematically, the variance of a dataset X with n data points is calculated using the formula:
\(\text{Variance}(X) = \frac{1}{n} \sum_{i=1}^{n} (x_i – \bar{x})^2\)
Where:
- \(x_i\) represents each individual data point in the dataset.
- \(\bar{x}\) represents the mean (average) of the dataset.
- n is the total number of data points in the dataset.
In simpler terms, variance measures how much each data point in the dataset differs from the mean of the dataset, squared. Then it averages these squared differences.
A larger variance indicates that the data points are more spread out from the mean, while a smaller variance indicates that the data points are closer to the mean. The square root of the variance is known as the standard deviation, which is another common measure of dispersion.
How to Calculate Variance
Calculating the variance of a dataset involves several steps. Here’s a step-by-step guide on how to calculate variance:
1.Compute the Mean (Average):
Calculate the mean (average) of the dataset by summing up all the values and dividing by the total number of values (data points).
\(\text{Mean} (\bar{x}) = \frac{\sum_{i=1}^{n} x_i}{n}\)
2. Calculate the Squared Differences:
For each data point x_i, subtract the mean \bar{x} and square the result. This step computes the squared differences from the mean for each data point.
\((x_i – \bar{x})^2\)
3. Sum the Squared Differences:
Sum up all the squared differences calculated in the previous step.
\(\sum_{i=1}^{n} (x_i – \bar{x})^2\)
4. Divide by the Number of Data Points:
Divide the sum of squared differences by the total number of data points (denoted by n).
\(\text{Variance}(X) = \frac{1}{n} \sum_{i=1}^{n} (x_i – \bar{x})^2\)
This gives you the variance of the dataset.
Example of Variance
Suppose we have a dataset: X = \(\{3, 5, 7, 9, 11\}\)
Compute the Mean: \(\bar{x}\) = \(\frac{3 + 5 + 7 + 9 + 11}{5} = \frac{35}{5} = 7\)
Calculate the Squared Differences: For each data point:
\((3 – 7)^2 = 16\)
\((5 – 7)^2 = 4\)
\((7 – 7)^2 = 0\)
\((9 – 7)^2 = 4\)
\((11 – 7)^2 = 16\)
Sum the Squared Differences: 16 + 4 + 0 + 4 + 16 = 40
Divide by the Number of Data Points: Since there are 5 data points: \(\text{Variance}(X) = \frac{40}{5} = 8\)
So, the variance of the dataset X is 8.