Measures of Spread

In this section we’re going to learn about measures of spread, measures that tell us how spread out or bunched up our data is.

Thirteen-minute video

You can also view this video on YouTube


Key Points

Measures of spread tell us how spread out our data is.

  • Range - difference between highest and lowest value
  • Inter-quartile range (IQR) - difference between upper and lower quartiles

Why are they used?

  • Is our data mostly clustered around the mean, or spread out?
  • Does one data set vary more than another?

Limitations

  • Measures of spread don’t tell us everthing about the distribution of our data if it doesn’t have a parametric distribution.

Variance

Variance (\( \sigma^2 \)) is the square of average difference between a value and the mean of our data, divided by the size of our dataset

\( \sigma^2 = \frac{\sum_{i=1}^{n}(x_i - \overline{x})^2}{n} \)

Where:

  • \( n \) is the size of our dataset
  • \( x_i \) is the \( i \)th data point of our dataset
  • \( \overline{x} \) is the mean of all the data points.

Standard Deviation

Standard deviation is a common measure of spread that is used in the calculation of other statistics. It is closely related to variance and the calculation is nearly the same.

There are two ways to calculate it, depending on whether (( x \) is a sample or a population. A population is all of the data there could be (e.g. the height of everyone in the world), and a sample is a subset of the population (e.g. the height of everyone in your study).

The standard deviation of a sample is calculated as follows:

\( SD = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \overline{x})^2}{n - 1}} \)

The standard deviation of a population is calculated as follows:

\( SD = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \overline{x})^2}{n}} \)


Questions

1. Check your understanding

Given the following data, answer the questions below: 3, 5, 5, 6, 7, 8, 8, 9

Check Answers

2. Implementing in Code

1. Range

Write a function double range(double[] arr) that takes an array of numbers and calculates their range.

2. Inter-quartile Range

Write a function double iqr(double[] arr) that takes an array of numbers and calculates their inter-quartile range.

You might want to make use of some of your code from the last section for calculating the median.


Summary

In this section we have learned about measures of spread.

  • You should be able to calculate the range and inter-quartile range of a dataset.
  • With reference to the formulas, you should be able to calculate the variance and the standard deviation of a dataset.
  • You should understand why measures of spread are used.

In the next section we learn about data visualisation.