Correlation

In this section we’re going to learn about correlation, which is not - as you may know - causation.

Twelve-minute video

You can also view this video on YouTube


Key Points

  • Correlation is not causation.
  • There are different measures of correlation, but they express correlation as a value between -1 (perfect negative correlation) and 1 (perfect positive correlation)
  • A correlation of 0 means there is no correlation at all between the variables

Scatterplots

You can use a scatterplot to eyeball correlation.

Positive Negative No Correlation

Pearson’s r

Pearson’s r gives a measure of correlation for parametric data. It checks if there is a linear relationship between variables.

It’s also called the product-moment correlation co-efficient.

r=(xix)(yiy)(xix)2(yiy)2

Spearman’s Rho

Spearman’s Rho gives a measure of correlation for non-parametric data. It checks if there is a monotonic relationship between variables (i.e, as x increases, y increases), but perhaps by a non-linear amount.

To do this, it basically calculates Pearson’s r on the ranks of the variables. (i.e. you only care about the order of the values in relation to one another, not the exact values)

Linear Regression

Linear regression tries to find a linear model, or function that relates the two variables with the minimum error.

A linear relationship is a straight line, which has the equation:

 y=a+bx

(Which means the y value for a point on the line is going to be b times the x value plus a. a here is also known as the intercept and b is also known as the gradient)

To find the values for a and b in this equation, we can used the least squares method. The formulas for that method are these:

b=(xix)(yiy)(xix)2

a=ybx


Summary

In this section we have learned about data visualisation.

  • You should know what correlation is.
  • You should be able to identify correlations on a scatterplot.
  • You should know what Pearson’s r, Spearman’s ρ, and linear regression are and roughly how they work.
  • You should be able to look these methods up if you need to use them.

You may now move on to the descriptive statistics challenges