Student's T-Test
We are going to learn about a common inferential statistic called the Student’s T-Test (it was published under the psedonym ‘Student’). It is a so-called parametric test, because it assumes your data follows a normal distribution.
Watch the video and then answer the questions below.
Thirty four-minute video
You can also view this video on YouTube
Key Points
- The t-test calculates the T statistic
- The T statistic can be converted to a p-value by comparing it to the t-distribution
- The t-test assumes our data is normally distributed
- T-tests can compare only up to two groups
One-sample and paired t-test
If we want to compare a group mean against a known value (e.g the population mean \( \mu \)), or a mean of a group of differences with a known value (e.g. (\ \mu = 0\)), we use the following formula:
\[ t = \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} \]
We use the t-test (as opposed to the z-test) when we don’t know the population standard deviation \( \sigma \), so we use the sample standard deviation \( s \). Because of the uncertainty in calculating \( s \), we get a t-statistic instead of a z-statistic, and have to compare it on a t-distribution.
Two-sample t-test
If we want to compare a differences between group means (\(\overline{x}_1 - \overline{x}_2\)) against an expected difference (e.g. 0 as in the formula below), we use the following formula:
\[ t = \frac{(\overline{x}_1 - \overline{x}_2) - 0 }{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]
Where \( \overline{x}_1 \) and \( \overline{x}_2 \) are the means of the two groups, \( n_1 \) and \( n_2 \) are the number of observations in the two groups, and \( s_1 \) and \( s_2 \) are the standard deviations of the two groups.
Questions
1. Check your understanding
1. Pick the appropriate statistical test formula
| Expression | \( z = \frac{x - \mu}{\sigma} \) | \( z = \frac{\overline{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \) | \( t = \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} \) | \( t = \frac{(\overline{x}_1 - \overline{x}_2) - 0 }{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \) | ||
|---|---|---|---|---|---|---|
| 1. | I compare the average height of two groups | |||||
| 2. | I compare a group’s performance in a puzzle against a theoretical mean that assumes completely random behaviour | |||||
| 3. | I assess the IQ of a group assuming \( \mu = 100, \sigma = 15 \) | |||||
| 4. | I investigate if drinking coffee increases a participant’s heart rate compared to a resting value | |||||
| 5. | I run a counterbalanced game enjoyment study. Each participant plays two games and rates each of them. I want to see if one game is more enjoyable than the other |
2. Calculate the t statistic
I collect 10 sensor readings each from 2 sensors. I want to see if there is a difference between the means of their readings.
| Group 1 | Group 2 |
|---|---|
| 0.4 | 6.3 |
| 3.6 | -1.2 |
| 3.3 | -11.3 |
| 1.5 | -6.3 |
| -1.7 | -5 |
| 0.1 | -3.4 |
| 4.2 | 2.4 |
| -1.8 | 14.7 |
| 1.9 | -2.9 |
| -3.6 | 9.9 |
We should use a:
We get a t statistic of:
(2 decimal places)
Here there are 20 data points. Because we “spend” 2 of them to calculate the mean for each group we are left with 18 degrees of freedom (df). We do a 2 tailed test against an \( \alpha = 0.05 \). Look up our t and df in a table of t-statistics for different alpha values. If our t value is larger than the one listed for our t and df we have significance.
Is our result significant?
Summary
In this section we have learned about sampling from a population and threats to validity involved. Once you’ve completed the questions, you can move on to the inferential statistics challenges.