Measures of Spread
In this section we’re going to learn about measures of spread, measures that tell us how spread out or bunched up our data is.
Thirteen-minute video
You can also view this video on YouTube
Key Points
Measures of spread tell us how spread out our data is.
- Range - difference between highest and lowest value
- Inter-quartile range (IQR) - difference between upper and lower quartiles
Why are they used?
- Is our data mostly clustered around the mean, or spread out?
- Does one data set vary more than another?
Limitations
- Measures of spread don’t tell us everthing about the distribution of our data if it doesn’t have a parametric distribution.
Variance
Variance (\( \sigma^2 \)) is the square of average difference between a value and the mean of our data, divided by the size of our dataset
\( \sigma^2 = \frac{\sum_{i=1}^{n}(x_i - \overline{x})^2}{n} \)
Where:
- \( n \) is the size of our dataset
- \( x_i \) is the \( i \)th data point of our dataset
- \( \overline{x} \) is the mean of all the data points.
Standard Deviation
Standard deviation is a common measure of spread that is used in the calculation of other statistics. It is closely related to variance and the calculation is nearly the same.
There are two ways to calculate it, depending on whether (( x \) is a sample or a population. A population is all of the data there could be (e.g. the height of everyone in the world), and a sample is a subset of the population (e.g. the height of everyone in your study).
The standard deviation of a sample is calculated as follows:
\( SD = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \overline{x})^2}{n - 1}} \)
The standard deviation of a population is calculated as follows:
\( SD = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \overline{x})^2}{n}} \)
Questions
1. Check your understanding
Given the following data, answer the questions below: 3, 5, 5, 6, 7, 8, 8, 9
2. Implementing in Code
1. Range
Write a function double range(double[] arr)
that takes an array of numbers and calculates their range.
2. Inter-quartile Range
Write a function double iqr(double[] arr)
that takes an array of numbers and calculates their inter-quartile range.
You might want to make use of some of your code from the last section for calculating the median.
Summary
In this section we have learned about measures of spread.
- You should be able to calculate the range and inter-quartile range of a dataset.
- With reference to the formulas, you should be able to calculate the variance and the standard deviation of a dataset.
- You should understand why measures of spread are used.
In the next section we learn about data visualisation.
- Previous
- Next