Numerical Systems

Introduction Positional Notation Converting Between Bases Binary Addition Binary Multiplication Challenges (Python) Challenges (Java)

Modular Arithmatic

Introduction The Modulus Operator Cyphers Modular Arithmetic Modular Inverse Challenges (Java)

Algorithms

Introduction Algorithms Time Functions Asymptotic Complexity Summation Challenges

Propositional Logic

Introduction Propositional Logic Conjunction and Disjunction Implication and Equivalence Bitwise Operators Challenges

Set Theory

Introduction Sets Subsets and Supersets Set Operations Challenges

Graph Theory

Introduction Graphs Directed Graphs Structure of Language Search Trees Neural networks Game Object Trees Challenges Graph Question Generator

Proof

Introduction Proof Proof by Cases Proof by Contradiction Proof by Induction Challenges

Maths to Code

Maths to Code Challenges Question Generator

Descriptive Statistics

Introduction Statistics Measures of Central Tendency Measures of Spread Data Visualisation Correlation Challenges

Probability

Introduction Probability Basics Combinatorics Probability Trees Law of Total Probability Bayes Rule Challenges

Inferential Statistics

Introduction Probability Distributions Samples and Populations Hypothesis Testing Z Scores Student's T-Test Challenges

Challenges

We will be working on these challenges in the lab session for this topic.

You do not have to complete these in advance. However you may choose to get started on them in advance if you wish. You still need to attend the lab even if you have completed the challenges! If you do not complete them in the lab it is recommended that you finish them in your own time.

Significance

We have seen how to calculate z and t scores. However, when reporting our experiments we need to turn them into p values. Either we can calculate the accurate p value (complicated), or we can use a table to look it up!

Task 1: Example

The mean and standard deviation of the age of a mother at the birth of their first child in England and Wales are:

Mean	28.8	Source
SD	5.91	Source

In a study of 100 first time mothers from a university town it is found that the mean age of first birth is 30.12. Should we conclude that this difference is statistically significant?

Calculate the appropriate statistic.

Task 2: Z Scores to P Values

To convert a z-score to a p value we need to look up the z score in a table appropriate for the kind of test we’re doing.

This table shows us p values for different z scores. If we have a score of 1.142 we would check the tenths 1.1 row and the hundredths 0.04 column. If we were using the two-tailed z table, the answer would be 0.25429. This would be our p value. Not significant.

If we’re doing a one tailed test we need to use the correct table, depending on whether we expect a positive or negative result.
In a two-tailed test, if our score is negative, that just indicates our value is smaller than the mean. We ignore the sign when using the two-tailed table.

If we wanted to check if we had achieved significance at 0.05 without finding the p value, we find the z-score corresponding to a p value of 0.05. This z score is called a critical threshold. If our hypothesis tests’ z score is greater in magnitude than this critical threshold, we have a significant result.

What is the critical threshold for z at an alpha of 0.05 using a two tailed test?
What is the critical threshold for z at an alpha of 0.05 using a one tailed test assuming a negative effect?
What is the critical threshold for z at an alpha of 0.05 using a one tailed test assuming a postive effect?
Does the experiment described above achieve significance?

Task 3: T Scores to P Values and Degrees of Freedom

To convert a T score to a p value, we need a table of t scores. With this we can find critical values for different common alpha levels for the t statistic.

To use this table we need to know our find our degrees of freedom. Why? Because the shape of the t distribution depends on the sample size \( n \).

The degrees of freedom in a t test are equal to \(n - 1 \) for a one-sample t test, and \( n - 2 \) for a two-sample t test.

Imagine we measure the length of 10 Holopleura, a genus of Longhorn Beetle in a particular tree. Suppose that our results show a mean length of 1.34 cm and a standard deviation of 0.45 cm. Our field guide says that such beetles have a mean length of 1 cm. Is this a significant result?

Should we use a one-tailed test or a two-tailed test?
Calculate the t statistic
Calculate the degrees of freedom
What alpha level is our result significant at? e.g. \( p < 0.1, p < 0.05, p < 0.01 \)

Effect Size

When we do a parametric test (i.e. one that assumes a normal distribution in our data), we can calculate the effect size. If we can, we should, because effect sizes are very useful to know. Cohen’s D is a common measure of effect size expressed in terms of standard deviations.

Task 4: Cohen’s D

Calculate the effect size for the birth age experiment
How would this be conventionally described?
Calculate the effect size for the Holopleura experiment
How would this be conventionally described?

Cohens D for a Z Test

\( d = \frac{\overline{x} - \mu}{\sigma} \)

For a T Test

\(d = \frac{\overline{X}_2 - \overline{X}_1}{SD_{pooled}} \)

Here \( \overline{X}_1 \) and \( \overline{X}_2 \) are the group means.

If you have equal sized groups:

\( SD_{pooled} = \sqrt{ \frac{ SD_1^{ 2} + SD_2^{ 2} }{2}} \)

Otherwise:

\( SD_{pooled} = \sqrt{ \frac{ (n_1 - 1) \times SD_1^2 + (n_2 - 1) \times SD_2^2 }{n_1 + n_2 - 2}} \)

Where \( n_1 \) and \( n_1 \) are the sample sizes of each group and \( SD_1 \) and \( SD_2 \) are the standard deviations of each group.

Conventional values

Interpret the result against these conventional values:

Cohen’s D	Conventional description
\(d = 0.2\)	Small effect
\(d = 0.5\)	Medium effect
\(d = 0.8\)	Large effect

Chi Squared

Another common statistical test is the Chi Squared Test. This is for comparing counts of nominal variables.

\[ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} \]

Where \( O_i \) is an observed value, \( E_i \) is an expected value, and \( n \) is the number of such values.

Imagine I ask 100 people For their favorite size of dice. I have 5 observed values. Imagine I expected people to have no prefrence, my expected values would be that 1/5 of the sample liked each type of dice.

	d4	d6	d10	d12	d20
Observed	5	50	10	15	20
Expected	20	20	20	20	20

Here our degrees of freedom are 5 - 1 = 4.

Once you’ve got a \( \chi^2 \) value, you can look up the p value in a table.

Task 5: Chi-Squared Example

What is the \( \chi^2 \) for the above data?
What is the p value?
What does this tell us about people’s preferences with regards to dice?

Task 6: Comparing groups

I can also use the Chi Squared Test to compare two different groups. The groups might be men and women, gamers and non-gamers, people treated with drug A or B, and so on. The outcomes might be if they prefer chocolate or vanilla ice cream, whether or not they own a pony, and so on.

	Outcome A	Outcome B	Sum
Group A	10	20	10+20 = 30
Group B	15	5	15 + 5 = 20
Sum	10+15 = 25	20+5 = 25	50

We have an obvserved value per Group/Outcome pair. We find an expected value for each observed value by working out, given the number of participants in each group and outcome, how many we would expect to have in that pair if they were distributed randomly.

Here the degrees of freedom are \(df = ( rows -1) \times (columns -1)\).

What would the expected value look like? Try and work this out yourself before taking a look at a step by step walkthrough if you’re stuck.

What is the \( \chi^2 \) for the above data?
What is the p value?
What does this tell us about the two groups?

Improve this page

Getting Started

Introducing Mathematics