Hypothesis Testing
Inferential statistics are used for testing hypotheses - statements about how the world might be, e.g. “Driverless cars have fewer crashes than conventional cars”. We don’t know if this is true or not until we collect data to test it.
Watch the video and then answer the questions below.
Sixteen-minute video
You can also view this video on YouTube
You can find the slides here and also as .odp.
Key Points
Hypotheses
- Our alternate hypothesis is the hypothesis we’re testing, e.g. “Driverless cars have fewer crashes than conventional cars”
- Our null hypothesis says there is no effect, e.g. “Driverless cars have about the same number of crashes as conventional cars.”
P-Values
- A p-value tells us the liklihood our results were the result of chance.
- We compare it against our alpha (
α
) value, which is usuallyα = 0.05
. - If
p < α
, then our results are statistically significant.
Errors
- A Type I error is falsely rejecting the null hypothesis.
- A Type II error is falsely accepting the null hypothesis.
- The likelihood of making a Type II error is called beta
β
. It is related to the power of the experimentpower = 1 - β
Effect Size
- Our effect size is how big the effect we have detected is.
Questions
1. Check your understanding
1. P-values
Assuming a conventional value for α = 0.05
, say which of the following p-values are statistically significant:
2. Power
- You run an experiment to test if there is an effect of violence in video games using a statistical power of
0.4
. You get a null result (i.e. you do not reject the null hypothesis). Is this interesting?
- You run the experiment again with a statistical power of
0.9
and get a null result. Is this interesting?
3. Multiple Testing
The bonferoni correction is a way of adjusting the level of α
if you are doing multiple testing. The α
for each experiment is set to α/m
, where α
is your overall alpha, and m
is the number of experiments you are running.
-
-
You run 8 significance tests maintaining an overall
α = 0.05
. You get a p-value for one test of0.006
. Should you claim significance?
-
You run two experiments on video game addiction using a questionnaire to measure addiction. One checks if time played has an effect. The other checks if game type has an effect. You use the same data for both. You get results of
p=0.041
andp=0.062
. Should you claim a significant result?
-
You use the same data to run two experiments with unrelated hypotheses. Do you need to adjust
α
due to multiple testing?
2. Maths to Code Practice
The formula for Cohen’s d (for equal sized groups) is as follows
\(d = \frac{M_2 - M_1}{SD_{pooled}} \)
Where
\( SD_{pooled} = \sqrt{ \frac{ SD_1^{ 2} + SD_2^{ 2} }{2}} \)
Write a function in Java that takes two number arrays of equal length and returns d.
You might want to look at my code for calculating standard deviation from the maths to code lecture.
Summary
In this section we have learned about hypothesis testing. Once you’ve completed the questions, you can move on to the next section, where we will look at z scores.
- Previous
- Next